smathis30
Ramblin' Wreck
- Messages
- 732
Normally ill post about my rankings towards march madness when the Kaggle competition opens up, but this year i wouldn't to omit using KenPom rankings and entirely use my own. So with that, here is the first batch of my homebrewed rankings.
Methodology:
Based off the kaggle competition, it looked like the most important things do determing sucess were the following 5 factors:
1. Points/Possesion
2. Rebound ratio
3. Players that scored 10pts/game
4. Free Throw Rate
5. Turnover ratio
I use linear regression to assign weights to each category to come up with a RAW offensive and defiencive score/efficiency.
From there, an adjustment factor is added to the raw average, and then based off the average SoS of each opponents offense and defense, the RAW score is multipled by a standardization factor, which is
Real Score = (RAW Score + adjustment factor) * (1-SoSmultipler*Standard deviatons above mean)
Higher standard deviations means easier schedule, so thats why its a negative sign.
Determining weights:
I use linear regression. Each game is painstakingly added in and a score is plugged into a forumula using least squares regression to determine a score based off of
1. (Visiting O-Home D) *m1 + b1
2 (Home O - Away D) *m2 + b2
From there, I have an expected score differential. I plug that into finding the probability the home team wins with a standard deviation of 10, per what i found online of average score deviation in NCAAMB.
If the projected winner is the same as the actual winner, the "game score" is equal to -1*LN(Probability).
IF its wrong, the game score is equal to -1*LN(1-probability)
Due to the nature of natural logs, games with high confidence and are wrong hurt way more than being barely right. Games with 95% confidence, (like Deleware St beating la tech... they didn't) end up giving the formula 2.553 points, whereas Duke beating East Central Western STate university will give a score of esentially 0. The average of all games is computed, and linear regression is than conducted to minimize the average game score and assign weights to each of the seven variables (5 for score, two for Standardization)
So with that, here is the top 25 through today and ACC rankings
Projected Scores for the week for good guys in gold:
GT 71 NW 81
St Johns 78 GT 77
Methodology:
Based off the kaggle competition, it looked like the most important things do determing sucess were the following 5 factors:
1. Points/Possesion
2. Rebound ratio
3. Players that scored 10pts/game
4. Free Throw Rate
5. Turnover ratio
I use linear regression to assign weights to each category to come up with a RAW offensive and defiencive score/efficiency.
From there, an adjustment factor is added to the raw average, and then based off the average SoS of each opponents offense and defense, the RAW score is multipled by a standardization factor, which is
Real Score = (RAW Score + adjustment factor) * (1-SoSmultipler*Standard deviatons above mean)
Higher standard deviations means easier schedule, so thats why its a negative sign.
Determining weights:
I use linear regression. Each game is painstakingly added in and a score is plugged into a forumula using least squares regression to determine a score based off of
1. (Visiting O-Home D) *m1 + b1
2 (Home O - Away D) *m2 + b2
From there, I have an expected score differential. I plug that into finding the probability the home team wins with a standard deviation of 10, per what i found online of average score deviation in NCAAMB.
If the projected winner is the same as the actual winner, the "game score" is equal to -1*LN(Probability).
IF its wrong, the game score is equal to -1*LN(1-probability)
Due to the nature of natural logs, games with high confidence and are wrong hurt way more than being barely right. Games with 95% confidence, (like Deleware St beating la tech... they didn't) end up giving the formula 2.553 points, whereas Duke beating East Central Western STate university will give a score of esentially 0. The average of all games is computed, and linear regression is than conducted to minimize the average game score and assign weights to each of the seven variables (5 for score, two for Standardization)
So with that, here is the top 25 through today and ACC rankings
Projected Scores for the week for good guys in gold:
GT 71 NW 81
St Johns 78 GT 77