This is a sponsored article brought to you MBZUAI,
If you’ve ever tried to predict how a cell’s shape will change after a drug or gene editing, you know it’s partly science, partly art, and mostly costly trial-and-error. Imaging thousands of conditions is slow; Searching for millions is impossible.
A new paper in nature communication Proposes a different route: simulate those cellular “after” images directly from the molecular readout, so you can preview the morphology before picking up the pipet. The team calls their model MorphDiff, and it is a diffusion model that is guided by the transcriptome, with the pattern of genes moving up or down after a perturbation.
At a high level, the idea flips a familiar workflow. High-throughput imaging is a proven way to discover the mechanism or spot bioactivity of a compound, but profiling every candidate drug or CRISPR target is not possible. MorphDiff learns from cases where both gene expression and cell morphology are known, then uses only the L1000 gene expression profile Situation To generate realistic post-perturbation images, either from scratch or by converting a control image to its perturbed counterpart. The claim is that the gains on fidelity and mechanism-of-action (MOA) retrieval over (unseen) perturbations held in large pharmaceutical and genetic datasets can rival real images.
led this research MBZUAI Researchers start from a biological observation: Gene expression ultimately drives proteins and pathways that make up what a cell looks like under the microscope. The mapping is not one-to-one, but there are enough shared cues for learning. Conditioning on the transcriptome also offers a practical bonus: there is far more publicly accessible L1000 data than paired morphisms, making it easier to cover a wider swath of perturbation space. In other words, when a new compound comes out, you likely have its gene signature, which MorphDiff can take advantage of.
Under the hood, MorphDiff combines the two pieces. First, a Morphology Variational Autoencoder (MVAE) compresses five-channel microscope images into a compact latent space and learns to reconstruct them with high perceptual fidelity. Second, a latent diffusion model learns to represent samples in that latent space, controlling each representation step with the L1000 vector through attention.
Wang et al., nature communication (2025), CC BY 4.0
Diffusion is suitable here: it is intrinsically robust to noise, and the latent space version is efficient enough to train while preserving image details. The team applies both gene-to-image (G2I) generation (starting from a noisy, state on the transcriptome) and image-to-image (I2I) transformation (using the same transcriptomic state to nudge a control image towards its perturbed state). The latter requires no retraining due to the SDEdit-style process, which is useful when you want to explain changes related to a control.
Producing photogenic images is one thing; This is another one to generate biologically faithful The ones. The paper leans into both: on the generative side, MorphDiff is benchmarked against GAN and diffusion baselines using standard metrics such as FID, inception score, coverage, density, and CLIP-based CMMD. In the JUMP (genetic) and CDRP/LINCS (drug) test splits, MorphDiff’s two modes typically come in first and second place, with critical tests run on multiple random seeds or independent control plates. The result is consistent: improved fidelity and diversity, especially on OOD perturbations where practical value remains.
The bigger picture is that generic AI has finally reached a fidelity level where in-silico microscopy can stand in for first-pass experiments.
What is more interesting for biologists is that the authors move beyond image aesthetics to morphological features. They extract hundreds of CellProfiler features (texture, intensity, granularity, cross-channel correlation) and ask whether the generated distributions match the ground truth.
In a side-by-side comparison, MorphDiff’s feature cloud matches the real data more closely than a baseline like IMPA. Statistical tests show that more than 70 percent of the generated feature distributions are indistinguishable from the real ones, and feature-wise scatter plots correctly capture the model differences of opinion By controlling the most annoying features. Importantly, the model also preserves the correlation structure between gene expression and morphological characteristics, with higher agreement to ground truth than prior methods, evidence that it models more than surface style.
Wang et al., nature communication (2025), CC BY 4.0
The drug’s results extend that story to thousands of treatments. Using DeepProfiler embeddings as a compact morphology fingerprint, the team shows that MorphDiff’s generated profiles are discriminative: classifiers trained on real embeddings also distinguish those generated by perturbation, and pairwise distances between drug effects are preserved.
Wang et al., nature communication (2025), CC BY 4.0
This matters for the downstream task that everyone cares about: MOA recovery. Given a query profile, can you find reference drugs with the same mechanism? MorphDiff’s generated morphologies not only beat prior image-generation baselines, but also outperform retrieval using gene expression alone, and they approach the accuracy you are used to. Real Images. In top-k retrieval experiments, the average improvement over the most robust baseline is 16.9 percent and 8.0 percent over the transcriptome alone, showing robustness across multiple k values ​​and metrics such as average precision and fold-enrichment. This is a strong indication that the simulated morphology contains complementary information to the chemical structure and transcriptomics that is sufficient to help find similar mechanisms, even if the molecules themselves do not look alike.
MorphDiff’s generated morphologies not only outperform previous image-generation baselines, but also outperform retrieval using gene expression alone, and they approach the accuracy you get using real images.
The paper also lists some existing limitations that indicate possible future improvements. Estimates remain relatively slow with spread; The author suggests adding new samples for speed generation. Time and concentration (two factors that biologists care about) are not explicitly encoded due to data constraints; The architecture can take them as additional conditions when matched datasets are available. And because MorphDiff relies on perturbed gene expression as input, it cannot link morphology to perturbations that lack transcriptome measurements; A natural extension is to create series with models that predict gene expression for unseen drugs (the paper cites Gears as an example). Ultimately, generalization inevitably becomes weaker as you move away from the training distribution; Larger, better-matched multimodal datasets would help, as well as conditioning on more modalities such as structures, text description, or chromatin accessibility.
What does this mean in practice? Imagine a screening team with a large L1000 library but small imaging budget. MorphDiff becomes a phenotypic copilot: generates predicted morphologies for new compounds, clusters them based on similarity to known mechanisms, and prioritizes images for confirmation. Because the model also reveals explainable feature changes, researchers can peek under the hood. Did ER texture and mitochondrial intensity progress in the way we expected for an EGFR inhibitor? Did two structurally unrelated molecules land in the same phenotypic neighborhood? It is these types of hypotheses that accelerate the discovery and repurposing of mechanisms.
The bigger picture is that generic AI has finally reached a fidelity level where in-silico microscopy can stand in for first-pass experiments. We have already seen the explosion of text-to-image models in the consumer domain; Here, a transcriptome-to-morphology model shows that the same diffusion machinery can perform scientifically useful functions such as capturing subtle, multi-channel phenotypes and preserving the relationships that make those images more than eye candy. It will not replace the microscope. But if it reduces the number of plates you have to run to find what counts, what matters is the time and money you can spend verifying the hit counts.


