Chris Miller

Bioinformatics Grad student at Baylor College of Medicine. My online home is at http://www.chrisamiller.com/
6 overtimes in 4 games? This series is nuts. #stlblues
RT @pathogenomenick: I think I’ve reached the point in my life where I realise I’m never going to organise my music and photo collections.
I mock, but it honestly sounds like a great alternative science career option. I'm all for more of these types of positions.
https://t.co/I4Ol8vACNx We have too much data and can't possibly find the time to write all these papers - help! #firstWorldLabProblems
Clearly there's been a mistake. You put a cheat sheet for vim on there instead of emacs. @KMeltzSteinberg @lexnederbragt
NSA: Making us all less safe. MT @EFF NSA said to have used Heartbleed bug to gather intelligence for two years https://t.co/lb8e2Ggn0a
Reviewed a software/methods paper with no source code, no implementation, no algorithmic details, and no detailed results. #reject
RT @obigriffith: Haussler "idea that medical centers can hire some postdocs and build their own genome analysis pipeline is BS" scattered applause at #AACR14
RT @StevenSalzberg1: http://www.forbes.com/sites... The Raw Milk movement wants you to drink bacteria-infested milk and especially give it to your babies. Great idea.
RT @theresa_lauren: Look, I'll get caught up on my emails as soon as someone invents a punctuation mark that signifies a level of enthusiasm between "!" and "."
"People ask me what I do in winter when there's no baseball. I'll tell you what I do. I stare out the window and wait for spring." -Hornsby
Some great shots from around St. Louis in this video: http://vimeo.com/70904697
http://www.howdoeshomeopathywork.com Best explanation I've seen yet.
RT @tuuliel: Introducing Authorizer, my script for formatting author & affiliation lists for papers : http://tllab.org/data-so...
Useful reminder to myself: You can't reason someone out of a position that they didn't reason themselves into.
RT @AsaTait: Fun fact: When my kid watches "Terminator" I will have to explain the concept of a phonebook, but not an autonomous robot killing machine.
Confession time: I lie to my almost-two year old so that I don't have to share my Lucky Charms.
A: How to calculate bwa mappability score? - http://www.biostars.org/p...
Calculating mapability is really straightforward: Take a reference genome's fasta file and generate all possible reads of a given length. Map those reads back to the reference genome, and identify which of them map uniquely in the expected location. You'll find some very simple bash/perl code that wraps BWA and then calculates mapability and gc-content scores in 100bp windows here: https://code.google.com/p... Keep in mind that this code only generates single-end reads, but by using paired-end reads, you up your mapability significantly. The code could easily be tweaked to do that. Be sure to think about what insert size is appropriate. There are also some precomputed mappability tracks available for download through the UCSC table browser. - Chris Miller
C: Visualization tools for NGS analysis results, suitable for biologists - http://www.biostars.org/p...
The visualizations that will be most useful depend to a huge extent upon the design of the study. There are many, many things you might want to explore in this data, so you're going to have to narrow it down to get reasonable recommendations. - Chris Miller
C: Can anyone suggest me a script based pipeline for exome sequencing with paired end reads generated by Illumina for tumor samples. - http://www.biostars.org/p...
This would probably be better off posted as a new question. It's likely that only the two or three people involved in this thread will notice that you've posted it here. That said, This previous question may help: What is the default quality encoding expected by BWA? - Chris Miller
A: Can anyone suggest me a script based pipeline for exome sequencing with paired end reads generated by Illumina for tumor samples. - http://www.biostars.org/p...
What you're asking here is probably beyond the scope of a Q/A site. To properly review all of these steps and provide feedback and suggestions would take hours. If you really need that level of support, then you're going to want to pay someone a consulting fee to help you get your pipeline set up. If you have specific questions about individual steps or commands, then Biostar can be a great resource, and please do feel free to ask questions. I'd encourage you to look through old posts first, as many of these topics have been addressed individually in the past. - Chris Miller
A: BreakDancer + SquareDancer - http://www.biostars.org/p...
The squaredancer code is buried in one of the other repos. Here's a direct link to the perl script: https://github.com/genome... - Chris Miller
A: to open file from dbGAP - http://www.biostars.org/p...
The first result for a search on "ncbi_enc" is this page: http://www.ncbi.nlm.nih.gov/books... It says: "The data files distributed through the dbGaP are all encrypted by NCBI’s special encryption algorithm. These files have a file suffix “.ncbi_enc”, indicating that they are NCBI encrypted files." That page also contains a link to the archive and encryption utilities. - Chris Miller
A: What does the term low pass mean? - http://www.biostars.org/p...
That doesn't really make sense. "Low-pass" generally refers to a genome that's sequenced to a depth under 10x. With this data, you can call germline SNPs, find structural variants, etc. It's not particularly useful for cancer sequencing though, as somatic variants are difficult to discern and forget about finding subclonal variants. - Chris Miller
A: liftOver bam file - http://www.biostars.org/p...
This is a bad idea. Since the genome assembly that the reads were mapped to are different, you really need to realign your data. There will undoubtedly be many places where reads map to different places than where liftover would place them, due to the differences between the assemblies. Convert the bam back to a fastq with picard, then redo the mapping with the aligner of your choice. - Chris Miller
C: Retreiving data from TCGA database - http://www.biostars.org/p...
Yes, your comment is way off topic, but I'll briefly respond to say that I haven't seen this behavior. In grad school, I was in a small lab with no direct connection to TCGA and we had no problems getting access to the protected data. Yes, you need to state a rough research plan so that they can verify that you'll safeguard protected patient information. Yes, you also need to wait until the marker paper is published, as the people who worked so hard to generate the data get the first shot at one general paper describing the dataset. I don't feel like that's unreasonable. - Chris Miller
C: Biostar 1.3.4: Post closing and Poll added - http://www.biostars.org/p...
FWIW, Stack overflow is revamping how they deal with closing posts: http://blog.stackoverflow.com/2013... - Chris Miller
C: VCF to MAF (Mutation Annotation Format) Conversion ? - http://www.biostars.org/p...
FWIW, MAF is a "standard" format within the TCGA project. Here's documentation: https://wiki.nci.nih.gov/display... - Chris Miller
C: Biopieces is a bioinformatic framework of tools easily used and easily created. - http://www.biostars.org/p...
If things are truly being passed strictly through pipes, this approach seems horribly susceptible to failures. If your plotting step at the end of a long sequence of steps fails, do you have to remap your entire WGS experiment? - Chris Miller
C: Publishing Bioinformatics Results - http://www.biostars.org/p...
If you want to get a good publication "just on computational basis", you're doing it wrong. An algorithm can be interesting or useful, but bioinformatics is about making biological discoveries. If you haven't applied your tool and come up with some new and fascinating insights into biology, then no, you can't expect to publish in a selective journal. - Chris Miller
C: Biostar Ads are now live. Feedback and comments are sought. - http://www.biostars.org/p...
Can we get an option to link the Ad to the job posting? Seems like this would be one of the most common uses of ads for academic users. - Chris Miller
C: grab coordinate of centromeres from UCSC - http://www.biostars.org/p...
If you're referring to the ideogram at the top, then I believe you're overthinking it. There's a lot of heterochromatin near the centromere that isn't necessarily actually the centromere. - Chris Miller