Moroccan Researchers Map Argan Tree’s Near-Complete Genome in Scientific First

3 months ago

Marrakech – A team of Moroccan and international researchers has produced the first near-complete, chromosome-scale genome of the argan tree, a development expected to advance conservation and breeding efforts for one of Morocco’s most valued natural resources.

The study, published in Scientific Data by Nature, was led by scientists from Morocco’s National Institute of Agricultural Research (INRA) in collaboration with the International Center for Biosaline Agriculture (ICBA), Texas Tech University, Mohammed V University, and ICARDA.

The argan tree (Argania spinosa) is native to Morocco’s west-central region and stands as the sole representative of the Sapotaceae family in North Africa.

Its UNESCO Biosphere Reserve-designated forest provides essential resources, including the globally prized argan oil, livestock forage, and wood. The tree also plays a direct role in combating desertification and soil erosion.

Yet natural argan populations have declined steadily since the 19th century. Climate change, overpopulation, and overexploitation have severely weakened the tree’s ability to regenerate naturally.

The study arrives at a critical time. Morocco’s “Generation Green 2020-2030” initiative targets doubling argan oil production to 10,000 tons by 2030. The plan also calls for planting 50,000 hectares of modern argan fields and rehabilitating 400,000 hectares of existing forest.

Here is how the team built the genome, step by step

To build the genome, the team sequenced a tree named TAGUERTE, collected from the Souss Plain Valley in the Taguerte commune. Researchers used PacBio HiFi long reads combined with Hi-C scaffolding technology.

The sequencing generated 86.22 gigabases of HiFi data from 4.7 million reads, 101.25 gigabases of Hi-C data at over 150X coverage, and 41.3 gigabases of RNA-seq data from root, leaf, and seed tissues.

The assembly resolved two haplotypes, each organized into 11 pseudochromosomes, consistent with the species’ known chromosome count of 2n = 22. Haplotype-1 spans 621 megabases with a scaffold N50 of 50 Mb and GC content of 33.79%. Haplotype-2 spans 615 megabases with a scaffold N50 of 51 Mb and GC content of 33.77%.

Quality assessments confirmed the assembly’s reliability. BUSCO completeness reached 97.8% for Haplotype-1 and 98.1% for Haplotype-2. Merqury quality value scores hit 75 for both haplotypes, reflecting high consensus accuracy and strong phasing. The assembled genome size closely matched the 645 Mb estimate from k-mer analysis.

Telomeric repeat sequences (TTTAGGG) were detected at both ends of most chromosomes. A few chromosome ends lacked detectable repeats due to small unresolved gaps. For this reason, the researchers conservatively classified the assemblies as “near-T2T” rather than fully telomere-to-telomere.

Gene prediction identified 35,183 gene loci producing 39,805 mRNA transcript isoforms and 410 tRNA genes. The average gene length was approximately 5,275 base pairs. Around 75.5% of gene loci had exons supported by RNA-seq alignments.

Functional annotation covered 76.46% of loci, with 24,108 assigned Gene Ontology terms, 29,679 linked to InterPro domains, and 32,706 matched to eggNOG orthologs.

Repetitive elements made up 61.65% of the genome, totaling around 379 Mb. Retrotransposons accounted for 24.24% of the genome, with LTR elements dominating at 18.66%. Gypsy and Copia subfamilies were the most prevalent. DNA transposons constituted 5.86%, simple repeats 2.79%, and unclassified repeats 26.07%.

The team used tools including hifiasm for phased assembly, Mabs for parameter optimization, purge_dups for removing redundant sequences, and YaHS for Hi-C scaffolding. Manual curation was performed using Juicebox Assembly Tools and PretextView.

All raw sequencing data, genome assemblies, and annotations have been deposited in public repositories. The sequencing reads are available in the Sequence Read Archive under accession number SRP565314.

The two haplotype assemblies are deposited at INSDC under accessions JBPKRC000000000 and JBPKRD000000000. Genome annotations are accessible through Zenodo.

‘A complex and rich genome’

Lead researcher Slimane Khayi, a genomics and bioinformatics scientist at INRA, told Le360 that a first draft of the argan genome was published in 2018, but “it was a preliminary version that constituted an essential scientific step.”

He noted that the argan tree “possesses a complex genome, rich in repetitive sequences and presenting strong internal diversity,” which older technologies could not properly resolve.

“For a long time, available technologies did not allow the correct assembly of such a difficult genome with sufficient precision,” Khayi said. Recent advances in genetic sequencing, particularly PacBio HiFi long reads and Hi-C scaffolding, finally broke through that barrier.

The scientist added that the genomic reference now allows researchers to “identify the genes involved in oil production, drought tolerance, and adaptation to arid conditions.”

He also framed the work as a matter of scientific sovereignty, stating that it “anchors the argan tree in its Moroccan biological identity” and “positions Morocco as a leading scientific authority on this endemic species.”

The study was funded through INRA-Morocco’s Midterm Research Program and the MCGP INRA-ICARDA program.

This genomic resource provides a foundation for understanding genetic diversity, drought resilience, and the biochemical pathways behind argan oil production. It could directly support breeding programs and long-term conservation of the species.

Read also: Argan Tree Use in Essaouira Dates Back 150,000 Years, Archaeologists Confirm