Acer CCC versus Acer Stress-Hardening Samples

So, the major difference between the CCC samples and the stress-hardening samples is that the first were sequenced using Lexogen 3’ QuantSeq, and the second were sequenced following the TagSeq protocol, which is a “specialty” library prep specifically tested and refined with coral samples. So although both projects were sequenced on the same type of machine with the same goal for output (NovaSeq S2 SE100), the results are vastly different.

I don’t think it is based on the extractions because when I nanodropped and Qubitted for both projects, they all had really high yields of RNA. Only a small fraction from each project were checked for RIN scores, so it is possible that the CCC ones had really low RIN scores in comparison.

The reason I think the CCC ones are worse is when you look at the multiqc of the raw reads, some have less than a million sequences:

Screen Shot 2023-07-06 at 1 49 03 PM

This plot also shows the percent failed versus the sequencing depth.

Screen Shot 2023-07-06 at 1 48 16 PM

Also, when you look at the overrepresented sequences, some are really high which suggests contamination.

Screen Shot 2023-07-06 at 1 43 57 PM

Trimming and polyA tail removal seems to get rid of a lot of base pairs for some samples:

Screen Shot 2023-07-06 at 1 49 43 PM

Which those happen to be the same samples with high overrepresented sequences and low sequencing depth overall.

Following STAR alignment to the 2019 Acer genome, I seem to get high alignment for some samples but very low alignment for others.

Screen Shot 2023-07-06 at 1 52 45 PM

It looks like a lot of things were unmapped due to being too short, which I think indicates that the sequencing didn’t work well or the cDNA library prep didn’t work well?

Screen Shot 2023-07-06 at 1 53 47 PM

This ends up being a problem though because when I try to run stringtie and gffcompare (even after filtering out some of the samples – 1088, 1100, 2264), I get issues with the transcriptome assembly/alignment back to the genome. Specifically, it says that everything aligns perfectly to the original genome, which is not right there is no way.

Screen Shot 2023-07-06 at 1 55 47 PM

I’m going to try rerunning the CCC ones and maybe be more stringent with which samples to run with it? When I look at the quality of the Acer stress-hardening samples, it looks like the minimum % aligned of the trimmed reads is 44%. Maybe I should set like a cutoff of alignment because stringtie needs a minimum alignment? Let’s remove the bottom 4 samples from the STAR alignment graph for the CCC samples: 1087, 1097, 1098, 2383

Here are the multiqc reports for the Acer stress-hardening samples so I have visuals to compare to:

Raw reads: Screen Shot 2023-07-06 at 2 34 50 PM

Trimmed reads: Screen Shot 2023-07-06 at 2 35 25 PM

STAR alignment: Screen Shot 2023-07-06 at 2 35 52 PM

Written on July 6, 2023