RPI, Ratings Percentage Index, is a rating summarizing a team's performance against its opponents. RPI is a weighted average if a team's own winning percentage, the winning percentage of the team's opponents, and the winning percentage of the opponents' opponents. There are several different RPI formulas which differ on the weights of the three winning percentage and also on weightings for home and away wins and losses.
RPI only considers whether a team wins or loses a game. The margin of victory is irrelevant. Although RPI advocates might say RPI considers who a team plays, it does not. RPI considers the winning percentages of whom a team plays. A team that is 10-10 in the Columbia Valley league counts the same as a team that is 10-10 in the Greater Spokane league.
For 2016-2017 through 2019-2020, the WIAA used using RPI to determine seeding for the state tournaments. The WIAA revised the formula and/or dataset for 2017-2018 and again for 2018-2019. For 2018-2019, the WIAA used 0.40* team winning percentage, 0.40 * opponent winning percentage, 0.20 * opponents' opponent winning percentage.
In June 2018, I ran a logistic regression on girls and boys games for each season 2006-2007 through 2017-2018. This was to determine a optimum set of weights for winning percentage (WP), opponents winning percentage (OWP) and opponents' opponent winning percentage (OOWP). Each year and gender had somewhat different optimum weights. I concluded to use weights of 0.40 for WP and 0.60 for OWP. I decided to drop OOWP, although statistically significant, since for out-of-state teams OOWP is practically impossible to get consistently. Also when examining the percentage of games actually 'picked' correctly using RPI, using 0.40, 0.60, and 0 for WP, OWP, and OOWP fared only slightly less than using year-to-year optimum weights determined by the logistic regression.
In looking at various RPI formulations, I was surprised how close many different weightings performed when checking to see how accurate they were at picking actual game outcomes. I concluded that one would really have to screw up royally to get under 80% right. Of course that still means 20% games are picked wrong. So I stopped worrying about the right RPI formulation. It really doesn't matter much. Do whatever 'feels' right. Of course, RPI itself might not be the best ranking criterion. But tweaking RPI formula doesn't do much to make it righter or wronger.
The formula I use is 0.40 * team winning percentage + 0.60 * opponent winning percentages. I do not adjust for home/away. When determining opponent winning percentage and opponents' opponent winning percentage, I am now excluding games involving the team whose RPI is being computed. I have recomputed RPI for previous seasons using this formula.
In adjusting for home/away, the idea would be to weight according to the portion of home wins expected. Since the historical figure for both boys and girls is about 55%, the adjustment would be * 1.1 for road wins and home losses, and * 0.9 for home wins and road losses. Close enough to 1.0 for me. WIAA RPI does not adjust for home and away.
Beginning with 2016-2017 I've been looking up the records for out-of-state and other out-of-scope teams (Washington JV and C teams), and going back for previous seasons where possible. If you see a record next to an out-of-scope team in the schedule pages, that is the record used for my RPI calculation. If there isn't a record, no opponent W-L is used for that team. Oregon teams' records are obtained from OSAA which has the official database for Oregon RPI calculations used for state tournament seeding. Idaho teams' records are obtained from idahosports.com. Other U.S. varsity teams' records are obtained from MaxPreps which may or may not be correct, although for a lot of teams it isn't bad (it is bad for Idaho, historically). Many Washington JV and C teams are obtained from WPAN. Old out-of-scope won-loss records (like the 1980s and 1990s) come from newspaper archives, so are not widely available.
The scheduling topology problem greatly affects RPI. Most of the games a team plays are within its league. Most of the non-league games are against close-by teams. Comparing district 3 with district 8 using RPI is stupid. Comparison of RPI between classifications would be misleading. The teams just don't play each other and likely play few, if any, common opponents. And with RPI, those few games are overwhelmed by the others. There is no regard for games that compare subsets of teams. These games are just thrown in with the others and averaged to unimportance. The statistical linear model used for the teambrunnhilde points rating is designed to utilize these high-leverage games in determining the rating.
As the season progresses, the opponent winning percentage and opponents' opponent winning percentages will tend to the mean (0.500), which will push the RPI toward the mean as well. When considering an RPI built from just league games, in a league schedule with an equal number of games between teams, the league game RPI reduces to a function of merely the team's W-L (the team being rated). Furthermore, for league games the overall league win percentage is 0.500 (by definition), so the league RPI tends to the mean. Although the team's overall RPI cannot be written (although I haven't done the algebra) as a sum of league game RPI and non-league game RPI, having a large portion of the games with a tendency to the mean seems to contradict the rationale for RPI in the first place. Some leagues have primarily league games: e.g., CWAC has 90% league games, and some have as few as 30% or lower.
Anyway, I don't like RPI much, but it is an often used metric so I include it.
I denote my RPI as TBRPI (teambrunnhilde RPI) to differentiate it from the WIAA RPI.