Biostatistics II: 2008

Thursday, April 10, 2008

Question 3, Chap 9

Here is how I did this problem.

predict e_inj, mu
predict dev, standardized deviance
generate e_rate = e_inj/ childyrs
lowess dev e_inj, bwidth(0.2) msymbol(Oh)
count if dev>2 dev < -2
list dev e_inj inj_dth if dev>2 & inj_dth >= 2

MIKE

Tuesday, April 8, 2008

Chater 9

Julie and I got through the last question, although we abbreviated filling in the rest of the table using lincom, we just did a couple of examples. Here is our code:

set memory 11000
use C:\WDDtext\8.ex.InjuryDeath.dta
xi: glm inj_dth i.age_mom i.lbw i.educ_mom i.income i.illegit i.oth_chld i.race_mom i.pnclate i.age, family(poisson) link(log) lnoffset(childyrs) eform
* A couple examples of using lincom, completing the rest of the table
lincom pnclate _Iage_mom_29- _Iage_mom_30, irr
lincom _Iage_mom_29- _Iage_mom_30, irr
sort age age_mom lbw educ_mom income illegit oth_chld race_mom pnclate
collapse (sum) childyrs= childyrs inj_dth = inj_dth, by(age age_mom lbw educ_mom illegit income oth_chld race_mom pnclate)
xi: glm inj_dth i.age_mom i.lbw i.educ_mom i.income i.illegit i.oth_chld i.race_mom i.pnclate i.age, family(poisson) link(log) lnoffset(childyrs)
predict e_inj_dth, mu
predict dev, standardized deviance
generate e_rate = 1000* e_inj_dth/ childyrs
label variable e_rate "Incidence of Injury Death per Thousand"
lowess dev e_rate, bwidth(0.2) msymbol(Oh)

Monday, April 7, 2008

Devoted class goers

I looked over the homework and the first two questions are really easy. I just changed one line of code (given below) and then ran the exact same program as we did for the the last chapter, question 3. I will start to work on #3 and if anyone has any specific questions on it, feel free to ask.

xi: glm inj_dth i.age_mom i.lbw i.educ_mom i.income i.illegit i.oth_chld i.race_mom i.pnclate i.age, family(poisson) link(log) lnoffset(childyrs) eform

MIKE

Saturday, March 15, 2008

Thanks for your insight Mike

Thanks for taking the time to publish this comment about your and Xiaoming's conversation. I wish I had read this before the midterm (both in class and take home). Now that we have handed in the midterm, I was wondering what you all did to figure out the OR and 95%CI for Question #3, relating to the OR for a 53 year old with 155 mmHG compared to a 47 year old with SBP 113. I racked my brain trying to figure this out...sticking with the model (not using an interaction term) but bringing in all these variables. Anyone care to share?

Monday, March 10, 2008

Good things to know for the test ... perhaps?

I was talking with Xiaoming after class and he helped me out with a few concepts. These might be useful to know for the test.

1) (Looking at Table 7.2 on pg 235) If you want to compare the RR of CHD for a women with 71-80 mm Hg compared to a man with 101-110 mm Hg, then you divide the two numbers. 10.9 / 2.43 = 4.49 . This method also works when adding interaction terms (Table 7.3). You can not just divide the numbers to get the 95% C.I. for the RR because you have to take into account the covariance. Xiaoming said to use the “lincom” command in STATA to figure this out.

2) (Looking at the STATA output on top of pg 249) If you want to compare the relative risk for CHD of a 75 year old compared to a 45 year old (30 year difference) then you take the coefficient of age (1.04863) and raise it to the 30th power.

Mike

Wednesday, March 5, 2008

Aaron, HW6 #4

xi: stcox i.pd if pd~=1

*Risk = 6.045 95% CI = 3.658 – 9.989

xi: stcox i.pd if pd~=2

*Risk=2.018 95% CI = 1.327 – 3.069

Mike: Chapter 6, #5

There were only a few lines of codes for this problem.

summarize entage if pd==0
summarize entage if pd==1
summarize entage if pd==2
oneway entage pd, bonferroni tabulate

Tuesday, February 5, 2008

Mike: Chapt 3, #10 and #11

Below is my do file that I created. If you copy the code and put it in notepad and then save it with a .do extention, you should be able to run the whole code at once. You will need to the change the first line so that the file knows where the dataset is stored.

use E:\BiostatsII\3.ex.funding.dta
*Questions 10 and 11*Perform influence analyses on the other covariates in the model from question 6.
generate log_dol = log( dollars)
generate log_mort = log(mort)
generate log_yrs = log(yrslost)
generate log_hosp = log(hospdays)
generate log_disabil = log(disabil)
regress log_dol log_hosp log_mort log_yrs log_disabil
predict h, leverage
predict z, rstandard
predict t, rstudent
predict deltab1, dfbeta(log_mort)
predict cook1, cooksd
list id disease h z t deltab1 cook1 if deltab1 >.5 deltab1 <-.5
predict deltab2, dfbeta(log_yrs)
predict cook2, cooksd
list id disease h z t deltab2 cook2 if deltab2 >.5 deltab2 <-.5
predict deltab3, dfbeta(log_hosp)
predict cook3, cooksd
list id disease h z t deltab3 cook3 if deltab3 >.5 deltab3 <-.5
*I didn’t remove peptic ulcers because the cook score was so very low; a cook score that warrants examination is .276 and peptic ulcers had .131 cook score.
predict deltab4, dfbeta(log_disabil)
predict cook4, cooksd
list id disease h z t deltab4 cook4 if deltab4 >.5 deltab4 <-.5
* Regress log[dollars] against log[disabil] and log[hospdays].
regress log_dol log_hosp log_disabil
editpreserveset obs 30replace id = 99 in 30replace hospdays = 1000 in 30replace disabil = 1000 in 30
replace log_hosp=log(hospdays) if id==99
replace log_disabil=log(disabil) if id==99
predict yhat1, xb
predict h1, leverage
predict std_yhat1, stdp
predict std_f1, stdf
generate cil_yhat1= yhat1 -invttail(29-2-1, .025)*std_yhat1
generate ciu_yhat1= yhat1 + invttail(29-2-1, .025)*std_yhat1
generate cil_f1 = yhat1 - invttail(29-2-1,.025)*std_f1
generate ciu_f1 = yhat1 + invttail(29-2-1,.025)*std_f1
generate pred_dollars= exp(yhat1)
generate cil_dollarsf1 = exp(cil_f1)
generate ciu_dollarsf1 = exp(ciu_f1)
generate cil_dollarsp1 = exp(cil_yhat1)
generate ciu_dollarsp1 = exp(ciu_yhat1)
list hospdays disabil pred_dollars cil_dollarsp1 ciu_dollarsp1 cil_dollarsf1 ciu_dollarsf1 if id==99

Monday, February 4, 2008

HW 3 #13

We'll see if this is right in class...

#13

generate logdollars = log(dollars)

generate logdisabil = log(disabil)

generate loghospdays = log(hospdays)

regress logdollars logdisabil loghospdays

predict h, leverage

predict z, rstandard

predict t, rstudent

predict deltatab1, dfbeta(logdisabil)

predict cook, cooksd

display invttail(26,.025)

label variable deltatab1 "Delta Beta for log[disabil]"

scatter deltatab1 t, c(i l) s(o i)

sort t

list h z t deltatab1 cook in -3/-1

regress logdollars logdisabil loghospdays if id ~= 1

display (.90955 - .8168)/.13148

Chapter 3: 7-9

Here is my code for 7-9:

Question 7:

predict z, rstandard

predict deltab1, dfbeta(logmort)

predict cook, cooksd

display invttail(22,.025)

sort t

list disease h z t deltab1 cook if abs(deltab1)>0.5

Question 8

label variable logdol "Log thousands of dollars of NIH research funds per year"

label variable logprev "Log disease prevalence rate per 1000"

label variable loghospdays "Log thousands of hospital-days"

label variable logmort "Log disease mortality rate per 1000"

label variable logyrslost "Log thousands of life-years lost"

label variable logdisbil "Log thousands of disability-adjusted life-years lost"

graph matrix logdol loghospdays logmort logyrslost logdisbil, msymbol(Oh) mcolor(gs10)

Question 9

regress logdol loghospdays logmort logyrslost logdisbil if abs(deltab1)<0.5

display (-.0028409 -.170048)/.1852302

Chapter 3, #1-3

*Question 1...
* 95% CI for Age = 1.43 +or- 1.96(.46) = (.5284, 2.3316)
* Does NOT cross zero, so Null is REJECTED.
* 95% CI for Weight = 25.9 +or- 1.96(31) = (-34.86, 86.66)
* Does cross zero, so the Null is NOT REJECTED.
* The CI of the Age factor is quite tight, so the magnitude of effect for every one unit chage in age is quite a bit. The Weight factor CI is quite large, so there has to be an extremely large change in weight to see an effect of magnitude.
* Problem 2. Explore the relationship between dollars and the other covariates listed above. Fit a model that you feel best captures this relationship.
use "(data location)\3.ex.Funding.dta", clear
regress dollars incid preval hospdays mort yrslost disabil
regress dollars incid preval hospdays mort yrslost
regress dollars preval hospdays mort yrslost
regress dollars hospdays mort yrslost
regress dollars hospdays yrslost
* Problem 3. Perform a forward stepwise linear regression of log(dollars) against the following potental covariates: log(incid), log(preval), log(hospdays), log(mort), log(yrslost) and log(disabil).
generate logdol = log( dollars)
generate logincid = log(incid)
generate logprev = log( preval)
generate loghosp = log( hospdays)
generate logmort = log(mort)
generate logyrs = log( yrslost)
generate logdis = log( disabil)
* Threshold = .1
sw regress logdol logincid logprev loghosp logmort logyrs logdis, forward pe(.1)
* Threshold = .2
sw regress logdol logincid logprev loghosp logmort logyrs logdis, pr(.2)
* The covariates selected are logdis loghosp for the cutoff = .1 and logyrs logdis loghosp logmort for the cutoff = .2

#6

I'm confused as to how to calculate the degrees of freedom. I came up with 28 diseases in our regression of logdol loghospdays logmort logyrslost logdisbil. Thus, I calculated the degrees of freedom as (28 - 5 - 1) = 22 based on the results of the regression. What did you all get for the degrees of freedom? Based on 22 as the degrees of freedom, I obtained the following results for #6:

predict yhat,xb
predict h, leverage
predict std_yhat, stdp
predict std_f, stdf
generate cil_yhat = yhat - invttail(28-5-1,.025)*std_yhat
generate ciu_yhat = yhat + invttail(28-5-1,.025)*std_yhat
generate cil_f = yhat - invttail(28-5-1,.025)*std_f
generate ciu_f = yhat + invttail(28-5-1,.025)*std_f
generate cil_sbpf = exp(cil_f)
generate ciu_sbpf = exp(ciu_f)
predict t, rstudent
display invttail(22,.025)
2.0738731
lowess t yhat, bwidth(0.2) msymbol(Oh) mcolor(gs10) clwidth(thick)yline(-2.07 0 2.07)
generate out = t > 1.96 | t < -1.96
tabulate out

Thursday, January 31, 2008

#4-#6 on Homework 3

Here are the codes I used for #4-#6 on Homework 3 (you have to do #3 to do #4 so I included it). I think the graph in #6 could be better, but it is a start.

3.
. generate logdol = log(dollars)

. generate logincid = log(incid)
(4 missing values generated)

. generate logprev = log(preval)
(4 missing values generated)

. generate loghospdays = log(hospdays)

. generate logmort = log(mort)

. generate logyrslost = log(yrslost)

. generate logdisbil = log(disabil)

. sw regress logdol logincid logprev loghospdays logmort logyrslost logdisbil, forward pe(.1) pr(.2)

4. . sw regress logdol logincid logprev loghospdays logmort logyrs
> lost logdisbil, pe(.1) pr(.2)

5. regress logdol loghospdays

6. regress logdol loghospdays logmort logyrslost logdisbil

. predict yhat, xb

.predict h, leverage

. predict std_yhat
(option xb assumed; fitted values)

. predict std_f, stdf

. generate ciu_f = yhat + invttail(25-5-1, .025)*std_f

. generate cil_f = yhat - invttail(25-5-1, .025)*std_f

. generate cil_sbpf = exp(cil_f)

. generate ciu_sbpf = exp(ciu_f)

. display invttail(19, 0.025)
2.0930241

. predict t, rstudent

lowess t yhat, bwidth(0.8) mcolor(gs10) clwidth(thick) yline(-1.96 0 1.96) x
> label(4.7(.1)5.1)

Wednesday, January 30, 2008

Chapter 2, #8

* Question 8

use "(your data source)\2.ex.vonHippelLindau.dta"

codebook disease

generate logp_ne = ln( p_ne)

generate logtumorvol = ln( tumorvol)

* VonHippel-Lindau regression

regress logp_ne logtumorvol if disease == 0

regress logp_ne logtumorvol if disease == 1

* Slope estimate for vonHippel-Lindau = .242; the 95%CI = (.056, .428)

* Slope estimate for Multiple Endocrine Neoplasia = .242, 95%CI = (.196, .671).

generate s2 = (.374688289*26 + .267608258*7)/(28+9-4)

generate var_dif = s2*(.0903891^2/.374688289+.1003964^2/.267608258)

generate t = (.2418597-.4337545)/sqrt(var_dif)

generate ci95_lb = (.2418597-.4337545) - invttail(33, .025)*sqrt(var_dif)

generate ci95_ub = (.2418597-.4337545) + invttail(33, .025)*sqrt(var_dif)

list s2 var_dif t ci95_lb ci95_ub in 1/1

display 2*ttail(33, abs(t))

* The null is confirmed. These two slopes are statistically equal.

* The 95% CI = (-.486, .102)

Code for Ch.2 #7

use "(your data source here)\2.ex.vonHippelLindau.dta"

generate logp_ne = ln( p_ne)

generate logtumorvol = ln( tumorvol)

regress logp_ne logtumorvol

predict lnresid, rstudent

predict logyhat, xb

predict logstdp, stdp

generate logci_u = logyhat+invttail(_N-2, .025)* logstdp

generate logci_l = logyhat-invttail(_N-2, .025)* logstdp

regress logp_ne logtumorvol

display invttail(_N-3, .025)

scatter lnresid logtumorvol, yline(-2.0322445 0 2.0322445) lowess lnresid logtumorvol

Monday, January 28, 2008

Code for #11, Chap 2

generate logsbp=log(sbp)

generate logbmi=log(bmi)

regress logsbp logbmi if sex==1

predict yhatmen, xb

predict menstd_f, stdf

generate menci_uf = yhatmen + invttail(_N-2,0.025)* menstd_f

generate menci_lf = yhatmen - invttail(_N-2,0.025)* menstd_f

scatter logsbp logbmi if sex==1, msymbol(o) scatter yhatmen logbmi if sex
> ==1, c(l) s(i) scatter menci_uf logbmi, c(l) s(i) scatter menci_lf log
> bmi, c(l) s(i)

regress logsbp logbmi if sex==2

predict yhatwom, xb

predict womstd_f, stdf

display invttail(_N-2,0.025)
1.9604692

generate womci_uf = yhatwom + invttail(_N-2,0.025)* womstd_f

generate womci_lf = yhatwom - invttail(_N-2,0.025)* womstd_f

scatter logsbp logbmi if sex==2, msymbol(o) scatter yhatwom logbmi if sex
> ==2, c(l) s(i) scatter womci_lf logbmi, c(l) s(i) scatter womci_uf logb
> mi, c(l) s(i)

Saturday, January 26, 2008

Welcome to the blog!

I created this blog so that we could post our STATA problems. I thought this would be much easier than trying to copy it all down in class. I think this will help us get through our class.

Biostatistics II