Ch4_AcerCCC read counts

I’m writing up a summary for the Ch4 samples now, and I want to generate a summary table with the average and standard deviation of raw reads, trimmed reads, and percent alignment rates.

library(tidyverse)

sample_metadata <- read_csv("../input_files/sample_metadata.csv")

sample_metadata %>% 
  select(Location, Genotype, `Tube No.`, `M Seqs (Raw Reads)`, `M Seqs (post-polyA trimming)`, `% Aligned`, `M Aligned`) %>% 
  mutate(percent_alignment = as.numeric(gsub("%","",`% Aligned`))) %>% 
  group_by(Location) %>% 
  drop_na() %>% 
  summarise(count = n(),
            total_raw_reads = sum(`M Seqs (Raw Reads)`), 
            total_trimmed_reads = sum(`M Seqs (post-polyA trimming)`), 
            average_rawreads_persample = mean(`M Seqs (Raw Reads)`), stdev_rawreads_persample = sd(`M Seqs (Raw Reads)`), 
            average_trimmedreads_persample = mean(`M Seqs (post-polyA trimming)`),
            stdev_trimmedreads_persample = sd(`M Seqs (post-polyA trimming)`),
            average_alignment_rate = mean(percent_alignment),
            stdev_alignmentrate = sd(percent_alignment))  

Of 17 samples were sequenced, three were removed from downstream analyses due to poor quality and low alignment rates. Of the CCC samples (N=9), there were 164.4 million raw reads and 140.8 million reads after trimming. The average (± standard deviation) raw reads per sample was 18.27 (± 5.34) million reads. The average (± standard deviation) trimmed reads per sample was 15.64 (± 5.55) million reads. Average (± standard deviation) alignment rate was 40.79% (± 22.2%). Of the nursery samples (N=7), there were 126.4 million raw reads and 112.1 million reads after trimming. The average (± standard deviation) raw reads per sample was 18.06 (± 6.29) million reads. The average (± standard deviation) trimmed reads per sample was 16.01 (± 7.08) million reads. Average (± standard deviation) alignment rate was 47.59% (± 20.34%).

Written on August 20, 2023