January 2018 – That's a FOUL!

So Howcum Gig Harbor is #1 in 3A RPI?

RPI = 0.25 * WP + 0.50 * OWP + 0.25 * OOWP

That’s the well known WIAA RPI formula. It helps to win games. Those get weighted 25%. It helps to play good opponents. That gets weighted 50%. And if those opponents also play good opponents that helps too, weighted 25%.

Comparing Gig Harbor with Prairie (the poster child for RPI victims, at least according to the Vancouver Columbian)–

Prairie has a better win percentage: .833 (10-2) v .692 (8-4). Prairie will win all their league games. They haven’t lost a league game since 2000. The only possible loss is to Rogers (Puyallup), but that is still a likely Prairie win. Gig Harbor is favored in 7 or their 8 remaining games. Both teams are likely to see an increase in their WP, although the higher WP is, the less impact a win has on increasing WP.

Comparing the non-league opponents, Gig Harbor has a much stronger slate

Gig Harbor			Prairie
opponent	WP	TBPts	opponent	WP	TBPts
Kentlake	.769	30.2	Tumwater	.273	-5.4
Curtis	.750	25.0	Camas	.615	35.8
Kentridge	.923	46.4	Battle Ground	.250	-0.9
Snohomish	.833	33.2	Skyview	.357	18.7
Lynden Christian	1.000	45.5	Union	.769	27.0
Garfield*	.750	47.8	Whitney (CA)	.643	N/A
			S. San Fran (CA)	.727	N/A
			San Joaquin Mem (CA)	.667	N/A
			W.F. West	.909	48.3
			Rogers* (Puyallup)	.615	24.4

* not yet played

The WP record is calculated without the game involving the Gig Harbor or Prairie.

For league opponents, start with 0.500 for the average league opponent for league games not involving Prairie or Gig Harbor. Then combine with their WP for non-league games. The rest of Gig Harbor’s league is a combined 30-9 in non-league games. The rest of Prairie’s league is a combined 15-33. Gig Harbor’s non-league OWP is very high, and is likely to drop as league games are played. But the 30-9 record will cushion that drop. Prairie’s OWP will also drop but the 15-33 record will accelerate the decline.

Since league opponents have a lot of opponents in common (they’re in the same league) the overall record of a league in non-league games impacts the OOWP. Again the weakness of Greater St. Helens 3A will pull down Prairie’s OOWP. The strength of SSC will help Gig Harbor’s OOWP.

There is no better 3A league in non-league games than South Sound Conference. There is no worse 3A league for non-league games than Greater St. Helens 3A. Gig Harbor will likely remain highly ranked in RPI. Prairie is actually likely to fall as league games against the very weak GSHL 3A start to count in RPI.

So howcum Gig Harbor is #1 for RPI? They belong there.

Did Gig Harbor schedule for RPI? I think so. But a lot of other teams did too. When you’re rewarding people based on a measurement, and people know what that measurement is, you can expect people to try to optimize that measurement. I’m surprised that Prairie did a poor job of non-league scheduling. Prairie knows it is in a weak league. They’ve known that for decades. The only way they can counteract that for RPI purposes is the non-league schedule. Or maybe Prairie realizes that no matter what they do, they won’t be in the top 8 in RPI anyway.

Cross-classification scheduling

A recent post prompted an out-of-band exchange regarding scheduling games between classifications. The effect that scheduling itself has on rankings is a subject of interest to me. It is complex and requires a lot more time that I can devote during the season when, on some days, I spend five or more hours getting the previous day’s games into my database. But since I have that database I can ask it questions. It takes some time to formulate just how to ask, and understand just what that answer is telling me.

One question prompted by that exchange was just who schedules non-league games with teams in other classifications. League games are massively scheduled between teams in small subsets of the approximately 380 schools fielding varsity basketball teams. There really isn’t any point in using a ranking algorithm to determine who is the best team in a league when everybody plays everybody else one, two, even three times. That’s what the win-loss record is for. It is very easy to understand. The schedule is balanced, in most cases. The schedule for non-league games is not at all designed to even have the appearance of randomness. Thank goodness. I don’t want to see a random set of games selecting a 3A team to play a 1B team. I know that the only way Chief Kitsap girls will beat Bethel would be for the Tacoma Narrows bridge to collapse again while Bethel is driving across. The stat geek in me would find an experimentally designed schedule appealing. The fan in me would not. We get the non-league games that happen. They form the critical schedule structure for driving rankings of teams in different leagues and districts.

For starters, I looked at girls data for 2006-2007. That is the first season with the six current WIAA classifications. I did a lot of this by hand so hope I didn’t mis-transcribe. The inter-classification W-L table looks like

	v 4A	v 3A	v 2A	v 1A	v 2B	v 1B
4A		112-51	14-14	0-1	0-1
3A	51-112		34-45	3-3	3-1	1-0
2A	14-14	45-34		68-34	9-5	3-0
1A	1-0	3-3	34-68		55-50	14-9
2B	1-0	1-3	5-9	50-55		81-84
1B		0-1	0-3	9-14	84-81

Looking at the yellow cells, where the classification difference is at least two, there are 81 games. The higher classification team won 47 and the lower classification team won 34. This season is also interesting for how badly 3A fared against adjacent classifications, but I didn’t look at those. Examining the 81 games was enough for me this week. Leave that for another day. Here is the list of the 81 games.

Thirty 4A teams scheduled down two or more classifications (maybe some teams count more than once) In so doing 4A went 14-16. Using 2006-2007 TeamBrunnhilde points rating, eight of these were in the top half of the classification, 22 in the bottom half, 16 in the bottom quarter. Clearly not a representative slice of 4A teams doing battle with 2A, 1A, and 2B teams. Of the 28 2A teams that scheduled up by two classifications (all against 4A), half were from the top half of that classification, 7 in the top 10. 2A fielded a lot better lineup going up against 4A.

Look at 3A. Eleven teams scheduled down. Three in the top half (#24 twice and #27), eight in the bottom half and six of those in bottom quarter.

2A scheduled down 14 times. Only three of the 14 were from the top half. Four were from the bottom three 2A teams. Klahowya, #54 (last), an epically dreadful team in the midst of a four-year period in which it won one (1) game, managed to lose twice to #60 1B team, Quilcene. Quilcene, though, rated 3 points ahead of Klahowya, so these were not upsets.

Fourteen of 1A teams scheduling down came from the top half of the classification. More than half. But where do those 1A teams live? Four games for Colfax, others from eastern Washington. People may be sparse; good girls basketball teams aren’t. Next town over might be 1B but the team is good. Seattle Christian (18) and Forks (21) were the only top-halfers west of the mountains. Okanogan (17) lost to Entiat (1B #6) twice: Entiat a six-point favorite anyway.

Overall for the 81 games, 28 down schedulers came from the top half of their classifications (4A, 3A, 2A, 1A); 45 up schedulers came from the top half of their classifications (2A, 1A, 2B, 1B). This is what you would expect. Coaches don’t want to schedule a slate of mis-matches.

If you had two hats, one with names of teams, from say 4A and the other from 2A, and pulled a name from each hat: who would win? Knowing nothing else, I would guess the 4A team. But if you had a hat with the games that were actually scheduled and asked who would win, that’s a different question: 50-50 for this season. It isn’t a random schedule–not for any of the cross-classification comparisons (2 or more differences). In theory, giving extra credit to the lower classification team winning or even playing the game seems logical. In practice it doesn’t necessarily work out.

But that’s not the end. What about the adjacent classification games? Funny that I picked 2006-2007 because that looks really interesting. What about next year, and the next? Just because something shows up one season doesn’t mean it applies to others. Now that there is the RPI incentive to making schedule arrangements, how has that changed scheduling? Then there is geographical asymmetry. Good teams are not evenly spread across the state, but most games are close by, posing difficulties for making cross-state comparisons. Lots of questions. But I’ve got a game to catch this evening.

Cross classification wins and losses

A recent commentary by Tim Martinez in the Vancouver Columbian whined about the Prairie girls being ripped off by the RPI rating the third week into the season. I agree that Prairie is a lot better than the RPI rank. However the column wandered from there into ‘fixing’ RPI by including factors for classification. So is this a problem? Are crappy teams in a higher division gaming the system (intentionally or not) by beating up on even crappier teams in lower divisions? When a 4A team plays a 1A team should the 4A team be penalized?

Looking at last season here are the cross-classification W-L record for girls.

	v 4A	v 3A	v 2A	v 1A	v 2B	v 1B
4A		115-100	26-24	1-6	1-0	1-0
3A	100-115		81-60	11-21	2-3	0-0
2A	26-24	60-81		73-77	8-10	3-2
1A	6-1	21-11	77-73		56-38	25-17
2B	0-1	3-2	10-8	38-56		73-55
1B	0-1	0-0	2-3	17-25	55-73

Let’s look at those 1A v 4A games. Only seven, but non-league games are not scheduled according to randomized experimental design anyway. La Center had two wins over bad Heritage and Battle Ground teams. I guess those crappy 4A teams in Clark county should look for an real 1A pushover instead of La Center. Or Zillah beat Davis. Seattle Christian beat Auburn and Mount Rainier. Good 1A teams beating not good 4A teams. Cascade Christian beat Federal Way: a mediocre 1A team over a bad 4A team. And then Sunnyside beat Zillah. A good non-league matchup between two good teams, one is 4A and the other 1A, but both good.

How about this year? Here’s the table for games so far:

	v 4A	v 3A	v 2A	v 1A	v 2B	v 1B
4A		69-62	27-22	4-4	0-0	0-0
3A	62-69		65-47	14-13	1-1	0-0
2A	22-27	47-65		42-45	9-3	1-2
1A	4-4	13-14	45-42		62-25	16-12
2B	0-0	1-1	3-9	25-62		33-42
1B	0-0	0-0	2-1	12-16	42-33

Best game I’ve seen so far was Sunnyside Christian (1B) at Lynden (2A). Sunnyside Christian won, not an upset. Not a black mark against Lynden for losing. Tim Martinez would put the ‘scarlet RPI’ on Lynden’s warmups for even scheduling that game. Maybe Hudson’s Bay has played several lower classification teams. But they look to be teams roughly well matched to Hudson’s Bay. You want to schedule some games that are likely wins where you can successfully exercise your skills against a live opponent. You want some games that will be challenges. And other games where you’re well matched. Regardless of classification.

Should a ranking method specifically include classification as a factor? There is a wide spread between good and bad teams within a classification, and a great deal of overlap between capabilities of teams in differing classifications. RPI is already a Rube Goldberg ranking. Hammering in a classification factor that doesn’t reflect reality would just make it worse.