A cluster is a set of closely grouped data. Data may cluster around a point or along a line.
An outlier is a data point that is very different from the rest of the data in the set.
Problem 1 :
A scientist gathers information about the eruptions of Old Faithful, a geyser in Yellowstone National Park. She uses the data to create a scatter plot. The data show the length of time between eruptions (interval) and how long the eruption lasts (duration).
1) Describe any clusters you see in the scatter plot.
2) What do the clusters tell you about eruptions of Old Faithful?
3) Describe any outliers you see in the scatter plot.
4) Suppose the geyser erupts for 2.2 minutes after a 75-minute interval. Would this point lie in one of the clusters? Would it be an outlier? Explain your answer.
5) Suppose the geyser erupts after 80 minute interval. Give a range of possible duration times for which the point on the scatter plot would not be considered an outlier. Give your reasoning.
Solution :
1) There are clusters around 50 minutes and 80 minutes.
2) Eruptions of old faithful is more in the intervals 50 minutes and 80 minutes.
3) The point near (57, 3) appears to be an outlier because it does not fall into either cluster.
4) It would not lie in either cluster because the interval was too long for the first cluster, and the duration was too short for the second cluster. It might be considered an outlier because it is not very close to the rest of the data.
5) The possible duration is 3 to 5 minutes.
Problem 2 :
The scatter plot shows the basketball shooting results for 14 players. Describe any clusters you see in the scatter plot. Identify any outliers.
Solution :
x-coordinate ==> shots attempted
y-coordinate ==> shots made
Cluster :
By attempting 20 shots, 15 shots made.
Outlier :
(35, 18) is the point which has no relationship with other points. So, this is the outlier.
Problem 3 :
1) Does this scatter plot show a positive association, negative association or no association. Explain why ?
2) Is there an outlier in this data set, if so approximately how old is the outlier and about how many minutes does he or she study per day ?
3) Is this association linear or non linear ?
4) What can you say about this relationship between your age and amount that you study ?
Solution :
1) By drawing a line of best fit for the scatter plot shown above, we will get the raising line. So, it is positive association.
2) (12, 76) is the outlier.
3) It is a linear relationship.
4) Cluster is in between 5 and 15.
Problem 4 :
1) Does this scatter plot show a positive association, negative or no association ?
2) Is there an outlier in this data set, if so approximately how old is the outlier and about how many minutes does he or she study per day ?
3) Is this association linear or non linear ?
Solution :
1) By drawing a line of best fit for the scatter plot shown above, we will get the falling line. So, it is negative association.
2) (5.6, 96%) is the outlier.
3) It is a linear relationship.
May 21, 24 08:51 PM
May 21, 24 08:51 AM
May 20, 24 10:45 PM