Starting over
I think I need to start from the beginning and go through one pipeline that has been proven for that person (or lab group) to work time and time again, rather than try to frankenstein pieces of people’s codes together to get something to work (which is what I have tried thus far and haven’t been successful). It also helps to find a person with well-annotated code. Thankfully, Hollie Putnam’s lab has tried-and-true methods that go back to pipelines of Dr. Sam Barr and Dr. Ariana Huffmyer.
The reason I am doing this stems from discussions I had both with Jill Ashey (who pointed me towards the most recent pipeline from Hollie’s lab, Zoe Dellaert), and Kevin Wong (who made a good point to me that a TagSeq pipeline and an RNAseq pipeline are very different things, and that I can’t frankenstein codes from people doing either method).
Thus, I will follow the TagSeq pipelines of Dr. Barr and Dr. Huffmyer, and follow any updates that Zoe made in her code since she is the most recent to do it (I think).
First, let me review what I have attempted to do so far to give a summary for future me:
I was trying to analyze the Ch2_tempvariability Acer samples and the Ch4_AcerCCC samples the same way (but Ch 2 was TagSeq from UT Austin, and Ch 4 was QuantSeq (which equals RNAseq??) with UM Genomics).
Pipeline included:
- TrimGalore for adapter (from Natalia’s code
- TrimGalore for polyA tail (from Natalia’s code as well)
- Create STAR index (following a combo of Natalia’s code and Jill’s code
- STAR alignment (following a combo of Natalia’s code and Jill’s code
- Stringtie + merge GTF + gffcompare + re-assemble with Stringtie (following Jill’s code
- for the Acer CCC I also tried to use featurecounts following my code from the Pdam wound healing 2019 and tried to quantify directly from STAR following Natalia’s code
Important to note that I don’t think Natalia or Jill used TagSeq… so that could be why mine isn’t working (at least for the Ch2_tempvariability one. Who knows about the C4_AcerCCC).
What I should try (following Sam, Zoe, and Ariana’s pipelines):
- md5 to check file integrity
- Trimming with fastp
- alignment with HISAT2
- assembly with Stringtie2
- generate counts matrix with prepDE.py from Stringtie2