##Homework 3 ##NOTE - If you're downloading this from my webpage, please email for the data files. ###Question 1 Does the proportion of patients treated with a particular drug differ by which line of therapy they are on? For example, does the proportion of "First line of therapy" patients on Bortezomib (as compared to all other FLoT patients) differ from the proportion of "Second Line Therapy (after" patients on Bortezomib (again, as comapred to all other SLT(a patients)? ```{r} ``` ###Question 2 Can you classify patients as being on Dexamethasone vs. Bortezomib based on their gene expression profiles? ```{r} #to link up the data sets, you'll need to match on colnames of the expression data and the public_id of the treatment data. The first column in the expression data contains the gene names. It also turns out that there are some extra characters you'll need to remove from the colnames. I've put my solution after the last question, but try and sort out the matching yourself first. ``` ###Question 3 Do the gene expression patterns differ between "First line of therapy" and all other lines of therapy? If so, how? ```{r} ``` ###Question 4 What combination of variables in the treatment data set best predict the first principle component of the gene expression matrix? Please remove the gene name column first. ```{r} ``` ```{r} #assuming you loaded the gene expression data as dat and the treatment data as treat id_adj <- substr(colnames(dat), 1, 9) mt <- match(id_adj, treat$public_id) ```