Long term feasibility of purely file-based libraries

As I sit here waiting several minutes for Windows Explorer to finish choking on deleting/moving over 10000 tiny files as I fix an error in my component generating script, I am wondering about the scalability of sticking with a purely individual file based library. My library now has around 6000 components after generating many common passive component permutations: resistors 0402 to 2512 with E96 values 1Ω to 10MΩ, and some hand picked E24/E12 ceramic capacitor value ranges sized 0402 to 1210.

The stress it puts on the filesystem and OS whenever simple operations are performed seems to lead to various problems:

  • Extreme slowness when manually managing files (e.g. when dealing with generating entities) - and yes, I have a M.2 SSD, I think this might be an OS problem. The issue remains.
  • Lots of time spent waiting for libraries to scan every time small changes are made
  • Inefficient use of filesystem (though this is not that huge of a deal considering the small overall size of libraries compared to modern storage tech)
  • Several second lockups when searching for certain components, etc. This goes into a larger discussion that the UI thread should never be blocked when doing slow I/O operations like searching.

I am not proficient enough with software engineering to have a valuable opinion on a solution, but I have some ideas to consider:

  • Keep individual files for publishing libraries to GitHub as this will preserve diffability and all of the existing repos’ structure. Use a compressed format when actually using them with LibrePCB (e.g. concatenated into a single file)
  • Provide a compression/decompression tool (perhaps in the CLI) for library maintainers. The software would use this code internally to load libraries from the Internet.
  • Project-local libraries may stay in the individual file scheme since it’s probably less typical to have 6000 different components in a single project, though people do some crazy stuff and the software should support crazy stuff where reasonably able.
  • Operate on libraries in memory where possible and persist them to disk asynchronously

Hi @moshee,

Thanks for sharing your thoughts! This is a difficult topic and I agree probably some time we need to think about this. Even without doing any changes in the file format, there’s currently a huge potential of making library operations faster. However, at the moment there are much more important things to work on (and very limited resources) so its priority is very low…

Anyway, some comments:

Hm this sounds a bit strange. At least starting with LibrePCB 1.0, there’s no need anymore to create separate components (resp. devices) for every resistor value, if you really do that currently. You should generate only one device per package, and add parts to it (which are very lightweight, compared to devices).

This is actually not related to files. The search function does not perform any file search (this would take minutes :see_no_evil:). We use an SQLite database as an index which is also used by the search function. However, this whole thing is quite complex and not yet optimized for performance. Here is also a big potential just by improving the implementation - no file format change needed.

Thanks for the clarifying points.

At least starting with LibrePCB 1.0, there’s no need anymore to create separate components (resp. devices) for every resistor value, if you really do that currently. You should generate only one device per package, and add parts to it (which are very lightweight, compared to devices).

I seem to have misunderstood the point of the parts. I have been using this to store orderable SKUs of a given part number (e.g. tape and reel/tube or multiple sources). However, the UI does seem set up more in the way you described now that you mention it. I guess this is a feature request to add support for SKUs (I think you have a ticket open for this already).

However, at the moment there are much more important things to work on (and very limited resources) so its priority is very low…

I understand. I guess it is very salient to me right now being in the middle of trying to create a bunch of new parts for a design. I will see if there is anything I can do to help improve without any format changes.

Correct, parts are mainly used to specify MPNs, not SKUs. SKUs are currently not officially supported but you could still add arbitrary attributes like DIGIKEY to parts and make them appearing in the BOM (if desired).

Probably yes - often when creating/modifying library elements in the editor, the file scanning is finished before you actually need the result of the library rescan :wink: