Friday, November 8, 2019
Mathematically Modelling Basketball Shots Essays
Mathematically Modelling Basketball Shots Essays Mathematically Modelling Basketball Shots Essay Mathematically Modelling Basketball Shots Essay The manager of a professional basketball team is having a tough decision in choosing which of his two top scorers this season are better at free-throw shots. The final decision will go towards picking the team for Saturdays Cup Final match. On a training session one week before the match the coach decides to go all out and bring some mathematical genii in to model a situation where Lee Grimes and Dominic Aspbury, the goalscorers, will shoot at the basketball net. The mathematical genii are students from Cambridge and are benefiting from this opportunity in that they will be able to show evidence of coursework for their final exam. Their coursework will be using their abilities to collect data and test the appropriateness of a probability model on a real situation whilst the coachs aim will be to pick the better of the two players for the big game. If the random variables X and Y count the number of independent trials before the event, having a probability p, occurs then X and Y have geometric distributions: P ( X = r ) = q r 1 p where r = 1,2,3, X~G ( p ) and Y~G ( p ) I will define X as being the number of shots required before Lee shoots a basket. Therefore, Y is defined as the number of shots required before Dom shoots a basket. I will be attempting to see if X and Y have geometric distributions by taking samples of X and Y. The populations are the infinite range of shots capable from the two throwers taken in a discrete time period under varied conditions at the same level of skill. This is impossible to create so the coursework will have to involve sampling, therefore not producing results representative of the whole population. For this coursework I can not take random samples because it will not be possible to recreate due to the infinite choices of shot which could occur e.g. fatigue levels, mood type, improvement of skill level throughout the sampling etc. all could differ. I will record a sample of X by asking Lee to shoot a number of baskets and hence work out the relative frequency of success p. This result will allow me to model X as X~G ( p ) . Next I will record a sample for Y by asking Dom to shoot a number of baskets so that another value for the relative frequency of success p can be calculated. I can use the result to model Y as Y~G ( p ) . The conditions I will have to use are going to be as similar as possible to gain independent and identical shots. This will involve: * Five practice shots beforehand so that the feel of shooting is apparent a warm up before starting. * The shots being taken from the same free-throw position which is fifteen feet away from the base of the net and perpendicular to the back line. * The same type of shot being used using one hand to steady the ball and one to project the ball through the air. Same arms used each time. * The weather conditions being similar. In the sports hall there should be no significant alteration of the environment. * Each shot being taken one after the other to gain results, which will be under the most similar conditions. * When the shot is taken; a score implies one basket, a no score implies try again until you succeed. * Continue until the sample of eighty is reached and record all results If the data is successful I may be able to produce a reliable geometric model of the population from the sample enabling me to predict population parameters with greater confidence. Using the parameters I should be able to compare the populations by considering sample parameters. I have chosen a geometric model because it is an infinite distribution requiring discrete random variables and is able to accommodate the infinite range of shots that may be required to score a basket. The sum of all the probabilities will equal one (a probability density function). If X and Y have a geometric distribution, the distribution should look like this: The sample size shall be 80 as a large sample size makes the geometric distribution as accurate as possible for testing purposes. It also allows me to use the chi squared test on the model to check if there is any evidence to suggest that one thrower is better than the other at various critical levels. Assumptions that I am making to allow the model to work are that the trials are: * Identical: The factors are exactly the same. This provides a fair test and is a property of the geometric distribution. * Independent: The trials are not affected by the previous trial. The geometric model states that the events must be independent. No distribution could possibly account for the infinite amount of variables/influences that could occur e.g. improving skill as more shots are scored, fatigue etc. The variable would be different in each case. The five practice shots will make the distribution more geometric as it will warm up the performer beforehand so that they get used to the feel of shooting. * Have two outcomes score a basket or no score. * Repeated to gain the sample size Modelling the situation with a geometric distribution Let X be the number of attempts before a basket is scored for Lee: Probability of scoring a basket: P(score) = sample size/total number of shots = 80/269 = 0.2973977695 This implies X~G( 0.297 ) X can be modelled as a geometric distribution with a probability of scoring first time equal to 29.7% (1 d.p.) Finding Prob(X=r) Therefore P (no score) = 1 P (score) = 1- 0.2973977695 = 0.7026022305 Using the formula: P(X = r) = qr-1p where r = 1, 2, 3: q = probability of not scoring p = probability of scoring P( X = 2) = 0.7026022305 x 0.2939776957 = 0.2065493847 P( X = 3) = 0.7026022305(3-1) x 0.2939776957 = 0.14512205844 Finding Expected Frequency Expected Frequency for (X = r) = Prob (X=r) x sample size Therefore Expected Frequency for (X = 1) = 0.2973977695 x 80 = 23.791821 Expected Frequency for (X = 2) = 0.2065493847 x 80 = 16.7161869 Let Y be the number of attempts taken before a basket is scored for Dom: Probability of scoring a basket: P(score) = sample size/total number of shots = 80/345 = 0.231884058 This implies Y~G ( 0.232 ) Y is geometric with a probability of scoring first time equal to 0.232 (3 d.p.). This result states also that there is a 23.2% chance of scoring on the first attempt and I aim to model these results by a geometric distribution. Therefore P(no score) = 1 0.231884058 = 0.768115942 Therefore for Dom: P (Y = 2) = 0.768115942 x 0.231884058 = 0.1781138416 P (Y = 3) = 0.768115942(3-1) x 0.231884058 = 0.1368120813 Expected Frequencies will be: (Y = 1) = 0.231884058 x 80 = 18.55072464 (Y = 2) = 0.1781138416 x 80 = 14.24910733 Chi Squared Distribution The chi-squared distribution can be applied to measure the goodness of fit for the geometric models. It will examine the goodness of the model by considering the number of possible outcomes of the events and will analyse the validity of the assumptions. Thevalue will be expected to be small to suggest that the model fits the real distribution. A large value would suggest that the model is unlikely to be correct so I will use a 10% critical region to test it. * If thevalue lies within the critical region then, assuming the model is correct, it would mean that there is less then 10% chance of a result as high as this occurring. We reject the model as a consequence and conclude insufficient sampling etc. * Alternatively, if the value lies outside the critical region, the result is valid and there is a larger possibility of the value being what it is. The model is assumed to be correct and the model is accepted. Conclusion would be to state that the statistical model is appropriate to the situation and the assumptions are correct. In the tables, the expected and observed frequencies were calculated but how close together are the values? The closer the observed value to the expected value the more accurate the geometric model will be. The goodness of fit statistic is: where O = Observed Frequency E = Expected Frequency To find the best measure of goodness of fit, add up all values for each statistic and compare with the 2 probability distribution tables. The chi squared test should only be used if the expected frequency of a cell is more than five which means some of the groups are going to have to be combined. This enables the chi-squared distribution to be better approximated. The total frequency of expected frequencies should also be over 50. This makes the chi squared test work at a more accurate level. Lees chi squared test Using the equation : As we can see by the result = 7 To analyse the result with the chi squared test the number of degrees of freedom have to be established following this procedure: Degrees of Freedom = Number of Cells Number of Constraints In Lees table there are seven cells. The number of constraints is two because: o A sample size of eighty is one constraint: The sample has to be eighty. o The probability is another constraint: The mean of the model has to equal the mean of the data so we used the data to work this value out. * Therefore: Degrees of Freedom = 7 2 = 5 * at 10% critical level i.e. prob ( ) = 0.9 * but observed value of = 7.478504913 * 7.478 is less than 9.236 * therefore, the value is not in the critical region (result taken from probability distribution table) The value is not in the critical region implying the model is significant enough to use. Lees results fit into the geometric distribution model and therefore it is a good model for Lees data. There is evidence to suggest that the assumptions are true and therefore we accept the assumptions as part of the geometric model. See graph above for explanation of what the results show. Doms Chi Squared Test Using the equation : As we can see by the result = 5.694287179 * Degrees of Freedom = 8 2 = 6 * at 10% critical level i.e. prob ( ) = 0.9 * but observed value of = 5.694287179 * 5.694 is less than 10.645 * therefore, the value is not in the critical region (result taken from probability distribution table) Doms results fit into the geometric model, as the value is not in the critical region of 10%. We can assume that the geometric model was a good model to use for his results. We can again accept the assumptions as there is no evidence to suggest they do not fit into the geometric distribution. See graph above for an explanation of what the results shows. Both results are comfortably in the geometric distribution proving that they are reliable results/models and the assumptions made are valid. We can adapt Doms model so that five degrees of freedom can be used giving the same accuracy as Lees result. I am predicting that it wouldnt affect the results because there would need to be a dramatic increase in the value for it to be of any significance. Both performers have had their results analysed at the same number of degrees of freedom and there was no significant difference. It shows no alteration for the final conclusion and still no evidence is available to reject the models. Both results have shown X and Y can be modelled by the geometric distribution. By knowing this I could produce confidence intervals for any parameters I estimate from the distributions. However at this stage I will calculate the relevant parameter for this piece of coursework. I will estimate the expected number of shots required by Lee and Dom to score a basket. Expected Mean Values To find out the expected mean value for a geometric distribution it is defined as the sum to infinity of: all the probabilities, which are multiplied by the value of X (in Lees case), Y (in Doms case). This can be simplified conveniently to 1/p where p is the probability of scoring when X = 1 For Lee the expected mean value would be E[X] = = 3.3625 (4 d.p.) For Dom the expected mean value would be E[Y] = (4 d.p.) These results demonstrate the average amount of shots it takes until the performer scores. Lee, having a lower expected mean value than Dom, is shown to be the better free-thrower as he takes an average of approximately three shots to score, unlike four shown in Doms case. The total number of shots can be a very rough indicator of who seems to be the better free thrower. Lee took 269 shots and Dom accomplished 345 shots to score 80 baskets. Does this imply that Lee is more accurate? According to the expected mean values and the probabilities of scoring for each model it reinforces Lees success where all three tests are in his favour. There is a much higher chance now of Lee being picked for the game on Saturday. A factor of the investigation was whether taking constant shots at the basket improved performance. This may happen because training has occurred and the brain is learning from past mistakes. The question being asked is, were the five practice shots enough practice to enable an independent model to be produced or should it not have occurred? Raw data results were recorded in two stages; first 40 and second 40 and it suggests small decreases in many of the cells for 2nd 40 especially in Doms case. Lower values of X or Y become more frequent in the 2nd 40. This complicates results and so is a factor to consider if the coursework is completed again. The decreases in the higher X or Y values and the increases in the smaller X or Y values suggest evidence of fatigue, boredom, frustration etc. I can say now that skill level did not increase during the collection of the sample size but what is more likely to have occurred is the opposite. The explanation for Dom being more tired, bored or frustrated is probably because he shot a total of 345 baskets whereas Lee completed his in 269 shots. Two parent populations (X and Y) have been tested against geometric probability models and it so happens that they fit very snugly into them. Therefore, we can apply the knowledge that counting the amount of times before a basket is scored is modelled very well using a geometric distribution. There may be only two populations but they both show noticeable differences in their results and remain well within the statistical model. I will assume that it is highly probable for most other populations to fit into the geometric distribution on the basis that my models are very appropriate for the investigation. I have modelled the basketball situation in a real life atmosphere and the model was successful. Even though the situation is based on a theoretical distribution it was modelled appropriately. The club should now prepare for Lee having the role of free-thrower in this Saturdays cup final and accepting the fact that Dom is on the subs bench for the start of the game The data sampling was very organised and strict but not random. To have taken a random sample would mean: * Watching a random sample of club games throughout the season * Watching a sample of free-throws made by the performers from the game * Calculate who is most accurate A problem with this is time, as it would take a year to go through just one season, therefore it is impractical and illogical. The physical form of the player should also alter throughout the season so a random sample of more than one season would have to be made. A much better way is to watch all training sessions and take a general overview of who supplies the most points in miniature matches from free throws. This gives more of a view of consistency than on the day performance but during game situations the performer will be thinking more logically. A sample of eighty straight baskets is tedious and will affect performance. Modifications * Use a longer time period. The performers were rushed to collect their sample size within two hours as a result of school timetabling and so one of them had to rush his last twenty shots. * Use the same time period i.e. one performer did it one day and the other completed it the next day. Conditions may have been different and morale, energy etc may be variated for both Dom and Lee * Use foot-mats on the floor so that it indicates an exact position for the feet to stand instead of just using the line. This may be an insignificant difference but to improve the coursework it is better than no difference at all. * Using the same basketball. Half way through the sample collection the basketball was lost leaving us the trouble of having to use another basketball maybe of different weight, age etc and possibly affecting the results Improvements * I would like to calculate confidence intervals for both expected values (X and Y) to determine my degree of confidence in Lee being a better freethrower. * I would also like to be able to see if my result E[X] = E[Y] was statistically significant
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.