The online job exam for the two positions: Economic Statistics Data Team Leader, Social and Demographic Statistics Data Team Leader at NISR

1. What binary classifier evaluation metric is calculated with this formula:

(FP + FN) / (P + N)

2 points

1. F-score
2. Sensitivity
3. Precision
4. Error rate

2.A cyclist completes a 40km journey in 3 hours. For the first 10km the cyclist’s average speed was 15km/hr. What was the cyclist’s average speed in the remaining 30km of the journey in km/hr?

2 points

1. 13
2. 11
3. 15
4. 14

3.X and Y are independent random variables. X has mean 60 and standard deviation 8. Y has mean 40 and standard deviation 6. What are the mean and standard deviation of (X–Y)?select the sample which of the following factors is LEAST important?

2 points

1. Mean 40, standard deviation 7
2. Mean 20, standard deviation 2
3. Mean 20, standard deviation 14
4. Mean 20, standard deviation 10

4.A random sample of 500 is taken from a much larger population and the 95% confidence interval for the population mean is calculated as 28.2 ± 1.8. A further and independent random sample of 500 is taken from the same population and a new 95% confidence interval for the population mean calculated on the combined sample of size 1,000. Which of the following is the most plausible new confidence interval?

2 points

1. 3 ± 1.8
2. 2 ± 0.4
3. 1 ± 1.0
4. 2 ± 4.6

5.Which regularization method produces sparse parameters?

2 points

1. Random dropout
2. Ridge
3. Lasso
4. Early stopping

6.A colleague says they used oversampling to create a training dataset. What challenge is it likely they are trying to address?

2 points

1. High dimensionality of the dataset
2. Imbalanced data Missing values
3. Missing values
4. Non-linear relationships between features

7.__________ representation is efficient for image processing.

2 points

1. Raster
2. Formal
3. Vector
4. Manual

8.A country is divided into 81 districts of various sizes. The 10th largest covers 40,000 hectares and the fifth smallest covers 6,500 hectares. Assuming the size difference from one district to the next largest is fairly uniform, which of the following is closest to the median size of the 81 districts?

2 points

1. 24500
2. 19500
3. 27500
4. 30500

9.Which of these is wrong?

2 points

1. False negative = Incorrectly rejected case
2. All of the options
3. True negative = Correctly rejected case
4. False positive = Correctly identified case

10.Which of the following techniques can be used to convert a keyword into its base form?

2 points

1. Soundex
2. N-grams
3. Cosine Similarity
4. Lemmatization

11.Your team is beginning a new Geographic Information System (GIS) project using satellite imagery to identify residential areas in Rwanda. Which of the following will be most relevant for your project?

2 points

1. Euclidean space
2. Bray-Curtis Distance
3. Pythagorian space
4. None of the options

12.As part of a GIS project to anlayse crop yield, you want to analyse the effect of rainfall. However you only have rainfall measurements at points where there are weather stations, so decide to use interpolation to estimate rainfall at other points. What principle are you using?

2 points

1. Spatial autocorrelation
2. Convolutional Neural Network Estimation
3. Thematic autocorrelation
4. Thematic auto-correction

13.You are setting up a new file management system where files will be stored within sub-folders, and sub-folders stored within topic folders. Your manager is interested in the hierarchical relationships between files and folders. Which of these visualizations is likely to be LEAST useful for your manager?

2 points

1. Graphic values
2. Circular treemap
3. Sunburst diagram
4. Area chart
5. Which of these is used to evaluate classifiers?

2 points

1. None of the options
2. RMSE
3. R Squared
4. Accuracy

15.Data are collected on a sample of girls aged from 5 to 11 years. Their age x, in years and their height y in cms are recorded and found to be consistent with a linear relationship. The regression line of height on age is y = 75 + 5x. Which one of the following is a correct conclusion?

2 points

1. Over the next three years we would expect a 6 year old girl in the sample to grow by about 15cms
2. The average height of the girls in the sample at age 15 is expected to be 150cms
3. The maximum height of the girls in the sample is 130cms
4. The regression line of age on height can be found by rearranging the equation to give x = 0.2y – 15

16.A survey of voting intentions for a forthcoming local election is to be taken by sampling the population of the local area using a register of voters and their addresses. In deciding how to select the sample which of the following factors is LEAST important?

2 points

1. The size of the population in the area
2. How different the demographic characteristics are of the population in the area
3. How up to date the register of voters is
4. The amount of time and resources available for sampling the population
5. A solid cylindrical column is approximately 105 feet high and 10 feet diameter. Which of the following is closest to its volume in cubic feet? (The formula for the volume of a cylinder is π x r2 x h, where π = 3.14 approximately, r is the radius and h is the height).

2 points

1. 8250
2. 32950
3. 3300
4. 16500

18.Suppose the rate of interest on a savings account is 4% per annum, added to the account at the end of each year. How many years will it be before a sum of money deposited in the account has increased by a quarter?

2 points

1. 8
2. 5
3. 7
4. 6

Section B: Open questions/14marks

19.Estimates of population movements (human mobility) have traditionally been provided from relatively expensive sample surveys of individuals. What other data sources might now be available to replace or supplement survey data to provide insight into population movements on a daily basis? How could these sources be exploited? Please consider the types of application and analysis to be carried out, how this supplements survey data, and how the results would add to existing understanding of time use in the population. You may make plausible assumptions about the type of data available. You can assume that the data is available in consistent and well-documented format.

7 points

1. During the COVID-19 pandemic it became important to understand the impact of Coronavirus on the economy. What other data sources might now be available to replace or supplement traditional economic statistics to provide more timely, more frequent or more granular insights to understand changes in Rwanda’s economy? For each data source, explain the analytical techniques/tools which could be used? What are the psotential analytical outcomes from this analysis? You can assume that the data is available in consistent and well-documented format.

7 points

MARKING GUIDES OF PAST PAPER NATIONAL EXAMINATION P-LEVEL, O-LEVEL AND A-LEVEL2005-2019:

MARKING GUIDE NATIONAL EXAMINATION

Niba ufite umwana wiga S6; S3 na P6 Aya masomo yamufasha kuguma kwitegura kuzakora ibizamini bya Leta bisoza umwaka, turahari ngo tumufashe.

Hello!