med-mastodon.com is one of the many independent Mastodon servers you can use to participate in the fediverse.
Medical community on Mastodon

Administered by:

Server stats:

360
active users

#reproduciblecomputing

3 posts3 participants0 posts today
Continued thread

The #isc25 is over and I half-recovered from the weekend, too. Time to continue my thread summing up the #SnakemakeHackathon2025 !

To me, an important contribution was from Michael Jahn from the Charpentier Lab: A complete re-design of the workflow catalogue. Have a look: snakemake.github.io/snakemake- - findability of ready-to-use workflows has greatly improved! Also, the description on how to contribute is now easy to find.

A detailed description has been published in the #researchequals collection researchequals.com/collections under doi.org/10.5281/zenodo.1557464

snakemake.github.ioSnakemake workflow catalog | Snakemake worklow catalog
Continued thread

Returning from the #isc25 I will continue this thread with something applicable everywhere, not just on #HPC clusters:

Workflow runs can crash. There are a number of possible reasons. Snakemake offers a `--rerun-incomple` flag (or short `--ri`) which lets a user resume a workflow.

This contribution from Filipe G. Viera describes a small fix to stabilize the feature. Not only will incomplete files be removed after a crash, now it is ensured that all metadata with them are deleted too, before resuming: zenodo.org/records/15490098

ZenodoMetadata Cleanup for Snakemake
Continued thread

This morning, I am travelling to the #isc25 and hit a minor bug on #researchequals. Hence, no updates in the collection.

But still a few to describe without adding the latest contributions:

For instance, this one (zenodo.org/records/15490064) by Filipe G. Vieira: a helper function to extract checksums from files to compare with checksums Snakemake was already able to calculate. Really handy!

ZenodoExtra Input Helper Functions for SnakemakeThis Snakemake Hackathon Contribution adds several helper functions to infer file sizes and adds functions to calculate hash sums or return a file's content through a callable.
Continued thread

Before I continue uploading - and I do have a couple of more contributions to add to the #ResearchEquals collection - first another contribution by Johanna Elena Schmitz and Jens Zentgraf made at the #SnakemakeHackathon2025

One difficulty when dealing with a different scientific question: Do I need to re-invent the wheel (read: write a Workflow from scratch?) just to address my slightly different question?

Snakemake already allowed to incorporate "alien" workflows, even #Nextflow workflows, into desired workflows. The new contribution allows for a more dynamic contribution - with very little changes.

Check it out: zenodo.org/records/15489694

ZenodoAllowing Dynamic Load of Modules for SnakemakeSnakemake modules had to be explicitly defined and loaded at the beginning of a workflow. This limited the flexibility of workflows, particularly when dealing with complex dependency structures or when modules needed to be loaded conditionally based on runtime parameters. This contribution eases the procedure of dynamically adding 3rd party modules.
Continued thread

Let's take a look at another contribution of Johanna Elena Schmitz and Jens Zentgraf from the #SnakemakeHackathon2025

Snakemake users probably know that

`$ snakemake [args] --report`

will generate a self-contained HTML report. Including all plots and #metadata a researcher's heart longs for.

Now, why trigger this manually? If the workflow runs successfully, now we can write (or configure):

`$ snakemake [args] --report-after-run`

and Snakemake will autogenerate the same report.

For details see doi.org/10.5281/zenodo.1548976

#Snakemake #ReproducibleComputing
#OpenScience

ZenodoCreate Report After Running a Snakemake WorkflowThis contribution adds a flag to Snakemake to allow for immediate report creation after a workflow finished.
Continued thread

One important feature implemented in the #SnakemakeHackathon2025 : Snakemake will calculate file checksums to detect changes. If a file changes, the rule producing it needs to be re-executed when a workflow it re-triggered. But what if a file is too big for reasonable checksum calculation? You do not what to wait forever, after all.

This contribution describes the implementation of a threshold users may set: doi.org/10.5281/zenodo.1548940

ZenodoAdjusting Snakemake's Maximum File Size for ChecksumsSnakemake calculates checksums for input and output files during DAG building to determine if a rule needs to be re-executed. For huge files, computing these checksums can be a significant performance bottleneck, slowing down the DAG generation process. This paper describes the limiting of checksum calculation upon file sizes.

Did you know? During the #SnakemakeHackathon2025 we had a staggering 194 work items!

It took a while, but now we are gathering contribution reports and present them online as a ResearchEquals (fediscience.org/@ResearchEqual) collection:

researchequals.com/collections

The first 10 are online and I will post some highlights in the coming weeks.

fediscience.orgFediScience.org

There are many HPC admins who prohibit using considerable CPU time on login nodes. This is understandable.

I want to take this opportunity to provide a data point. My student has measured the accumulated CPU time (user + system) for a 9 h (precise: 33343 s) run of a Snakemake workflow. It was 225 s or about 0.67 % - including jobs which were carried out on this login node, e.g. `mv`, `ln` or download of data.

There is certainly room for improvements. There will ever be room for improvements.

But my dear fellow admins: Running Snakemake on login nodes as a shepherd of jobs, will impair nobodies work.

Over time, I will certainly gather more and different statistics. And will invest time in necessary improvements. Regarding CPU time for checking job status, however, I believe to have demonstrated, that this is a pretty high hanging fruit.

I will continue to find it disturbing, if new #HPC cluster users explicitly instruct a program to use one core/cpu only and then complain that the cluster is so slow. Slower than their basement server.

Usually they do not spot their mistake on their own.

But THIS is actually NOT the disturbing part: such users also tend to always use default parameters. This might or might not be the sensible thing to do for their problem. Also, when reading papers, software parameterization frequently is not reported.

We have a long way to go.

We have a new release for the #SLURM support plugin of #Snakemake !

It's a minor feature release which enables custom log directories and auto-deletion of successful job logs (such that not zillions of zero-meaning files accumulate!).

Check it out at: github.com/snakemake/snakemake

It took a while to get to this release (stress, family, and sickness took a toll). Hopefully, a future release will not take that long to be realized — the feature request list is considerable. 😉

GitHubRelease v0.12.0 · snakemake/snakemake-executor-plugin-slurm0.12.0 (2025-01-08) Features custom log file behaviour (#159) (cc3d21b) experimenting with automated release post to Mastodon (#166) (c06325d) Documentation fix headings in further.md (#168) (53...