都是其他网友的经验分享,这里汇总一下,希望帮到国人
1. Flip 10 coins, observed 6 heads. Q: fair or not?
2. Type I error, Power, relationship between them?
3. How would you model the # of years some patients would survive after a
primary surgery, given their family history, demographic covariates (e.g.,
age, race, etc), how to diagnose?
4. Given a sample size of n, how do you obtain 95% confidence interval for
the median? Two cases: a) n is large, say n > 100. b) n is small, say n < 10
5. Search engine comparison and search quality evaluation, if use logistic
regression
6. Assume # of children in a family is X, p(X=x) = p_x, x=0, 1, 2, 3, 4, and
p(X > 4) = 0. Now randomly pick a child from a family, and this child turns
out to be a girl. Q: What is the probably that this girl has at least a
sister?
7. Simulate rolling a die: fair die, any die, > 6 die
8. Search quality result, how to design a distance function compare
similarity?
9. Randomly call a US person asking household size. Q: What is the bias? Q:
Made 3 calls, household sizes 1, 2, 3, what is the unbiased estimate?
10. Explain logistic regression
11. An advertisement showed to 10 people, one click, how to estimate Click
Through Rate?
12. 2 Ads, how to compare CTR?
13. p(head) = 1/3, p(tail) = 2/3, how to construct a game, which duplicate a
fair coin?
14. A fair coin, construct a game, with p(A) = 1/3, p(B) = 2/3
15. Many fair coins, how to construct an event with p = pi - 3
16. Two person play a die rolling game, first rolling 6 wins, does it matter
who rolls first?
17. Logistic regression, how to estimate parameters?
18. How to design to see people like google map or other map?
19. Chrome data, searching data, how do you know people like Google or not?
20. Multicollinearity, ridge regression, bias/variance tradeoff
21. Difference between fixed effect model and random effect model
22. GLM, error structure, link function, estimation method
23. How to detect heterogeneity, see an increasing pattern, how to modify
your model
24. Suppose we want to modify search results by IP locations and compare
Click Through Rate. a) Divide into urban and rural, then compare, please
comment. b) What would you recommend. c) Suppose you got the result about
the treatment and control group, how would you conduct the test. d) If the
test power if low, how would you explain it to PM, what would you suggest to
improve power?
25. Logistic regression, why not replaced by linear regression?
26. Sampling form normal distribution, not report if x < 1, How it affects
type I error when conduct a test?
27. R dealing with big data, difficulties?
28. Search engine comparison and search quality evaluation, how to do with
logistic regression?
29. K ~ U[1, 2, …, 100], then toss a fair coin K times, p(exactly one head)
30. How to examine if a die is fair.
31. You are the TA in a class, a prospective student asks you how students
performed in the class, you have 10 exam scores for each student. Give only
one metric, support your claim.
32. 10000 data points, unknown distribution, what to find the center of the
data. a) Discuss about the center, give scenarios when mean or median is
better. b) Find the median from the data (use resampling)
33. How to determine if a set of data is normally distributed
34. One data point is given from a normal distribution with given variance,
how to determine the mean? If the data were given from a truncated normal (
record the data when it is positive), how to compute the posterior
distribution?
35. A coin being tossed 100 times with 80 heads and 20 tails, what is the
probability of head? A coin being tossed 10 times with 8 heads and 2 tails,
what is the probability of head? How to explain this?
36. When you Google ‘Flower’, a commercial query, usually at the top or
right of the web page, some ads are shown. Now apply a new search algorithm
that leads color change of these ad links, how do we know whether the
algorithm change is reasonable?
No comments:
Post a Comment