Data Analysis and Interpretation

Question 4
The data set of the insurance company shows that the data was for 45 claims, with the range being between \$561 and \$3165. The median is at 1496 and the mean at 1593, meaning that the claims after the median offered a wider range of distribution. The value of the standard deviation also indicates a wider range of distribution.
The data set given highlights that most of the claims paid for were mostly above the median, thus the reason why the real mean is above the median. The distribution of the payments indicates a shift to the right of the median in terms of total amounts of the claims that are of amounts higher than 1496 compared to those below the median.
This indicates that of the claims filed, a huge amount of money went towards settling claims above 1496 U.S dollars. The high of 3165 and low of 561 could also be used to explain the high value of the standard deviation, an indication of the wide range of distribution. The skewed distribution is further confirmed by the wider range of figures in the upper quartile, from 2063 to 3165 as compared to 561 to 1091 for the lower quartile. This is unusual, as the distribution to the right means that the expensive claims form a huge amount of the claims paid out by the insurance company. Majority of the claims paid (50%) however fell between \$1091 and \$2063, suggesting that most of the policies fall within this range.
Question 6
a) The range of distribution of the data before the median is much wider than after the median, the difference between the first figure 16.3 and the median 21.8 is 5.5, while the difference between the median and the final figure of 24.3 is just 3.5, this shows that the first half of the data is more spread out, while the second half is a bit more clustered. The fact that both the difference between the upper mean and the real mean is smaller than the difference between the lower mean and the real mean also confirms that the distribution is better spread below the median. Overall however, the distribution can be said to be quite clustered as the standard deviation is quite small and consistent with the small range of figures: 16.3 to 24.3. Considering that there were 49 tests run, a spread of just 8 is quite small.
b) The mean is 21.042857 while the median is 21.8. The median is actually not very different from the mean, meaning that the distribution is almost evenly spread out, but just skewed a little to the left, as the total value of figures from the right is less than the total of the values from the right. This closeness could also be explained by the fact that the range of figures in the upper half of the distribution is less than that of figures in the lower half (3.5 and 5.5 respectively).
c) Yes there are discernable patterns, the data fluctuation is quite massive, in that even though at one point the energy reading could be quite high, the next reading could be very low, indicating that the pattern present is actually the lack of a clear pattern, but rather a zig zag one that is regular at some points but irregular most of the time. The fluctuations are also sudden in nature and not progressive.
d) When the connect points option is removed, there appears to be no pattern at all to the distribution of the data.
e) Connection of the data points gives a sense of continuity, while a lack of connection gives a sense of each point being independent of the next one. The connections also show a pattern even though not a regular or progressive one, whereas when not connected, the points appear random and haphazardly distributed.
Question 7
The data set indicates that majority of the top 100 richest Russians are aged between 41and 51 years. Of those above the median, those below 50 are the highest in number. Although the cumulative age of those above 46 should push the a little higher than 46.89 it is just slightly higher than the median, meaning the number of those aged between 40 and 50n is the highest as earlier mentioned. The data set also indicates that those with a net worth of less than \$535 million have a clustered distribution, as the lowest ranks at 210 million while the median is \$535 million. The distribution of those above the median is more spread as the highest ranks at \$15.2 billion a very huge difference as compared to the last individual on the list. This explains the huge standard deviation. Majority of those above \$535 million however simply fall in the bracket of those between \$535 million and \$3.8 billion. This bracket contains the highest number, as 40% fall within this bracket.