stringtie2 for Ch2 temp variability

Stringtie is “a fast and efficient assembler of RNA-Seq alignments to assemble and quantitate full-length transcripts. putative transcripts. For our use, we will input short mapped reads from HISAT2. StringTie’s ouput can be used to identify DEGs in programs such as DESeq2 and edgeR” (from Sam’s github)

Here is the first stringtie code I will submit to Pegasus jobs:

#!/bin/bash
#BSUB -J stringtie2_postHISAT2
#BSUB -q general
#BSUB -P and_transcriptomics
#BSUB -o stringtie2_postHISAT2%J.out
#BSUB -e stringtie2_postHISAT2%J.err
#BSUB -u and128@miami.edu
#BSUB -N

module load python/3.8.7

and="/scratch/projects/and_transcriptomics"

cd "/scratch/projects/and_transcriptomics/Ch2_temperaturevariability2023/AS_pipeline/3_trimmed_fastq_files/Acer_aligned_bam_files"

data=($(ls *.bam))

for i in ${data[@]} ;

do \
/scratch/projects/and_transcriptomics/programs/stringtie-2.2.1/stringtie -p 8 -e -B -G /scratch/projects/and_transcriptomics/genomes/Acer/Acerv.GFFannotations.fixed_transcript_take3.gff3 -A ${i}.gene_abund.tab -o ${i}.gtf ${i}
echo "StringTie assembly for seq file ${i}" $(date) ; \
done
echo "StringTie assembly COMPLETE, starting assembly analysis" $(date)

So i think that worked yay.

So now, looking at Zoe, Ariana, and Kevin’s codes, they went straight from the StringTie assembly to the prepDE.py step.

Sam did the stringtie merge and gffcompare step. “This mode [–merge] is used in the new differential analysis pipeline to generate a global, unified set of transcripts (isoforms) across multiple RNA-Seq samples.”

I don’t know how necessary this step is. I did it when I was first following Jill’s pipeline, but…

Oh wait jk when I look at Zoe’s prepDE..py script, there is a stringtie –merge and gffcompare function at the beginning of it. This is the same for Ariana and Kevin lol. So yes this step is necessary.

But we can do it all in one code which is cool.

OH WAIT BUT THERE’s MORE: In Zoe’s code she hid a little note in there: “#Note: the merged part is actually redundant and unnecessary unless we perform the original stringtie step without the -e function and perform #re-estimation with -e after stringtie –merge, but will redo the pipeline later and confirm that I get equal results.”

So… is the merge part necessary or not? lol

I guess we just do it anyways so we have it?

There’s also another note from Zoe: “Had to add -e to stringtie –merge function, Emma, Danielle, Kevin, Ariana and Sam did not do this (there was an error that came up)”

Let’s follow the most recent code (Zoe’s):

#!/bin/bash
#BSUB -J stringtie_mergegtf
#BSUB -q general
#BSUB -P and_transcriptomics
#BSUB -o stringtie_mergegtf%J.out
#BSUB -e stringtie_mergegtf%J.err
#BSUB -u and128@miami.edu
#BSUB -N

module load python/3.8.7

and="/scratch/projects/and_transcriptomics"

cd "/scratch/projects/and_transcriptomics/Ch2_temperaturevariability2023/AS_pipeline/3_trimmed_fastq_files/Acer_aligned_bam_files"

ls *.gtf > gtf_list.txt

${and}/programs/stringtie-2.2.1/stringtie --merge -e -p 8 -G ${and}/genomes/Acer/Acerv.GFFannotations.fixed_transcript_take3.gff3 -o stringtie_acerv_merged.gtf gtf_list.txt
echo "Stringtie merge complete" $(date)

/scratch/projects/and_transcriptomics/programs/gffcompare-0.12.6.Linux_x86_64/gffcompare -r ${and}/genomes/Acer/Acerv.GFFannotations.fixed_transcript_take3.gff3 -G -o merged stringtie_acerv_merged.gtf
echo "GFFcompare complete, Starting gene count matrix assembly..." $(date)

for filename in *bam.gtf; do echo $filename $PWD/$filename; done > listGTF.txt

python ${and}/programs/stringtie-2.2.1/prepDE.py -g gene_count_acerv_matrix.csv -i listGTF.txt 

echo "Gene count matrix compiled." $(date)

Ok let’s try running this in Pegasus.

Gffcompare results:

Screen Shot 2023-07-25 at 11 39 20 AM

This is what Jill said she got for hers, so maybe it’s not concerning after all.

So I got everything up to gffcompare to work, but the prepDE.py script didn’t work.

In the error file it says: “File “/scratch/projects/and_transcriptomics/programs/stringtie-2.2.1/prepDE.py”, line 34 print “Error: line should have a sample ID and a file path:\n%s” % (line.strip()) ^ SyntaxError: invalid syntax”

But I have the sample ID and then the absolute path for each file.

Screen Shot 2023-07-25 at 11 34 26 AM

I’m just going to try to run the one prepDE.py line directly in the terminal and see if it can give me more clues as to what’s wrong with the code.

Ok I think it worked?? idk why it didn’t work in the script but whatever.

Screen Shot 2023-07-25 at 11 42 46 AM

Now I can secure copy this over to my local drive and work with the file there.

Written on July 25, 2023