## Multiple choice questions

1-“A measure of fit that indicates the approximate proportion of variation explained by the model” This statement describes which of the following measures?

- Recall
- Precision
- Sensitivity
- R-Squared

2-A cyclist completes a 35km journey in 2 hours. For the first 2km the cyclist’s average speed was 6km/hr. What was the cyclist’s average speed in the remaining 33km of the journey rounded to the nearest km/hr? engineering?

- 17
- 20
- 18
- 15

- A random sample of 500 is taken from a much larger population and the 95% confidence interval for the population mean is calculated as 423 ± 52. A further and independent random sample of 400 is taken from the same population and a new 95% confidence interval for the population mean calculated on the combined sample of size 900. Which of the following is the most plausible new confidence interval?

- 423 ± 98
- 424 ± 41
- 423 ± 52
- 425 ± 11

- X and Y are independent random variables. X has mean 100 and standard deviation 12. Y has mean 30 and standard deviation 9. What are the mean and standard deviation of (X–Y)?

- Mean 65, standard deviation 10.5
- Mean 70, standard deviation 21
- Mean 70, standard deviation 3
- Mean 70, standard deviation 15

- A welfare grant is provided to all individuals who earn less than 100 RWF a month. A researcher is interested in finding the effect of this grant on consumption. What regression strategy would be most effective at determining this effect??

- Support Vector Machine
- Regression Discontinuity Analysis
- Linear Regression
- Ridge Regression

- A colleague says they used Ridge Regression regularisation as part of their regression analysis. What challenge is it likely they are trying to address?

- Non-normal sampling distribution
- Non-linear relationships between features
- Overfitting
- High dimensionality of the dataset

- You have been tasked with developing a model to identify the topic of news articles that appear in a local website. The sample size is sufficient for modelling but not large, with approximately 1000 articles. Which of the following ordered set of methods would be most appropriate in achieving this?

- Word2Vec – Logistic Regression Model
- Doc2Vec – Logistic Regression Model
- Logistic Regression Model – Bag of Words
- Lemmatization – Bag of Words – Logistic Regression Model

- Which of these is the most significant problem in dealing with right-censored data?

- Right censored data can be analysed the same as any data
- None of the options
- Right censored data results in too few observations to draw results from
- The data may contain significant variation beyond the observation period that could bias any results drawn from the data

- Which of the following models uses L1 regularisation?

- Lasso Regession
- None of the options
- Convolutional Neural Network
- Ridge Regression

10.Which of these is wrong?

True negative = Incorrectly rejected case

- All of the options
- True positive = Incorrectly identified case
- False negative = Correctly rejected case

11.Which of the following can be used to determine the similarity of words, paragraphs or documents that have been represented as vectors?

- Soundex
- Stemming
- Cosine Distance
- N-grams

12.Your team is beginning a new Geographic Information System (GIS) project using satellite imagery to identify residential areas in Rwanda. Given that you only have a small amount of training data that is already classified, which of the following machine learning techniques would be most appropriate for this task?

- Linear Regression
- Convolutional Neural Networks
- K-Nearest Neighbours
- None of the options

13.As part of a GIS project, you have been tasked with identifying the point on each road in Rwanda that represents the border of a district. What standard GIS tool datasets would be most appropriate for this task?

- Applying the intersection tool to district and road shapefiles
- None of the options
- Applying a regression discontinuity model to an excel file of road quality data
- Applying the union tool to district and road shapefiles

14.You are visualising the results from a new survey where province level data is contained within countries, country level data contained within continents and continent level data contained within global data. Your manager is interested in the hierarchical relationships between provinces, countries and continents. Which of these visualisations is likely to be LEAST useful for your manager?

- Treemap
- Bar chart
- Sunburst diagram
- Circular treemap

15. Data are collected on a sample of girls aged from 9 to 15 years. Their age x, in years and their height y in cms are recorded and found to be consistent with a linear relationship. The regression line of height on age is y = 90 + 6x. Which one of the following is a correct conclusion?

- The maximum height of the girls in the sample is 100cms
- Over the next three years we would expect a 9 year old girl in the sample to grow by about 18cms
- The average height of the girls in the sample at age 15 is expected to be 120cms
- The regression line of age on height can be found by rearranging the equation to give x = 0.167y – 15

16.A survey of urban income and living standards is to be completed between January 2020 and January 2022. The sample is drawn from the population of urban areas. In deciding how to select the sample which of the following factors is likely to be the MOST important?

- Timing, location, duration and intensity of Covid-19 measures across the time period
- The amount of time and resources available for sampling the population
- The size of the population in the area
- How different the demographic characteristics are of the population in the area

17.A solid cylindrical drinks can is approximately 25 centimetres high and 8 centimetres diameter. Which of the following is closest to its volume in centimetres cubed? (The formula for the volume of a cylinder is π x r^2 x h, where π = 3.14 approximately, r is the radius and h is the height).

- 1256
- 1250
- 400
- 1886

18.Suppose the rate of interest on a savings account is 2.5% per annum, added to the account at the end of each year. How many years will it be before a sum of money deposited in the account has increased by more than half?

- 20
- 10
- 8
- 14

- In which of the following circumstances would you expect stratifying to be most useful, in helping to select a suitable sample?

- The survey is well resourced, and preliminary research indicates that the population is likely to be dispersed along clearly identifiable variables
- When the variation of the population is very different from that of potential clusters
- When a survey is few resources
- When the population of interest is spread out over a large area

- A container holds many thousands of a metal component. 80% of these are made of one type of metal and weigh 16 grams. The other 20% are made of another type of metal and weigh 25 grams. Which one of the following statistical distributions does mean weight of a random selection of 100 components from the container most closely follow?

- Normal
- Continuous uniform
- Exponential
- Poisson