Are Grand Chelem tournaments for left-handed players?

Rafael Nadal is perhaps the most famous left-handed player on the courts right now. But the competition form right-hands is fierce: Roger Federer and Novak Djokovic, with no doubt ones of the most frightening opponents, are right-handed.

In this little exercice we will use Google Sheets and the add-on function IMPORTFROMWEB() to see who is more willing to win a Grand Chelem final…

Use the add-on
IMPORTFROMWEB 
for importing elements from any webpage directly in a Google Spreadsheet.

Search the web

First, we’ll have to search the web and see how we can find the pieces of information we need.

From the web to Google Sheets…

Let’s open a new Spreadsheet. For once, we will not dread the blank sheet!

Let’s start by importing the ESPN table.

Obviously we could use copy/paste, but what if we want our data being updated for future tournaments?

The solution is to monitor the page by using the IMPORTFROMWEB() function:

In A1 we will put the URL of the ESPN page

A1http://www.espn.com/tennis/history

In A2, B2, B3, B4, the XPaths of each column of the table

A2
//tr[not(@class=”stathead”)]//td[1]
B2//tr[not(@class=”stathead”)]//td[2]
C2//tr[not(@class=”stathead”)]//td[3]
D2//tr[not(@class=”stathead”)]//td[4]

If you’re not comfortable with XPaths, don’t worry. You will learn it quickly as long as you understand basic HTML. Have a look at this article:

Find an XPath with little HTML knowledge

[not(@class="stathead")] has been added in order to remove the first row of the table that was misaligning the cells (try and see!)

Now in A4, we will run our function

A4=IMPORTFROMWEB(A1, A2:D2)

…And let the magic happen!
If you’re all good the table from ESPN should appear in your Google Sheets

Link it to Wikipedia

We want to retrieve the hand style for each player. Which means that we will scrape the wikipedia page of each player. That’s almost 200 pages

For this purpose, let’s open a new sheet in the same spreadsheet. This sheet will be a list of all the players we have in the previous table. To do that use the following formula

A2=UNIQUE({Sheet1!C5:C;Sheet1!D5:D})

In Column B, we will generate the Wikipedia URL for each player.
Urls in Wikipedia are not rocket science. In our case, it is just https://en.wikipedia.org/wiki/Name_Surname

In B2 type:

B2=”https://en.wikipedia.org/wiki/”&SUBSTITUTE(A1:A,” “, “_”)

And extend the formula for each player

Now that we have the urls, we just need the XPath corresponding to the hand information.

C1//*[text()=”Plays”]//following-sibling::*

Geese, that one looks more complicated.
It looks for the element that has the text “Plays” and returns its neighbour.
These relative XPaths may look cumbersome, but they are also the most robust ones: Even if the source code changes slightly, it’s very probable that the information we are looking for will keep being near “Plays”

Take a look at this XPaths cheatsheet for a better understanding

Next, let IMPORTFROMWEB do its work
Let’s create a function for the last tournament

C2=IMPORTFROMWEB(B2,$C$1)

Then extend for each player and wait! That will scrape about 200 pages

You’ll see that Wikipedia provides something like
“Right-handed (two-handed backhand)”, so we will have to do a bit of text manipulation in column D so it displays only the first part of the text (“Right-handed”)

D2=TRIM(INDEX(SPLIT(C2,”(“),1))

Now extend the formula to all players.

Joining the two tables

Let’s go back to our first sheet. We’ll add the hand style to our main table.
To join tables let’s use the VLOOKUP function which we will apply to each final, in the E column for the winners and in the F column for the runner-up.

For example in row 5

E5=VLOOKUP(D5,Sheet2!$A$2:$C, 3, false)
F5=VLOOKUP(D5,Sheet2!$A$2:$C, 3, false)

You should now be able to see for each final if a left handed player usually beat a right-handed or the contrary

Extend to all finals

So what? Who win? Left-handed or right-handed?

In the last step, we will show in column G:

  • TRUE if a final has been won by a right-handed player against a left-handed
  • FALSE if a final has been won by a left-handed player against a right-handed
G5=IFS(
AND(E5=”Right-handed”,F5=”Left-handed”, TRUE,
AND(E5=”Left-handed”,F5=”Right-handed”),FALSE
)

Then we just have to count the TRUE value vs. the FALSE values

=COUNTIF(G5:G,TRUE)Result: 48
=COUNTIF(G5:G,FALSE)Result: 52

Conclusion: Right-handed players have won more Grand-Chelem finals against left-handed players than the opposite.

On a side note, beware that when you look carefully the data, you’ll find errors like #URL_RETURNS_404 or #XPATH_RETURNS_NULL. That’s because in some cases Wikipedia didn’t find a page with this URL or that the page didn’t provide the information about the hand style

Find the whole tutorial in this shared Spreadsheet
https://docs.google.com/spreadsheets/d/1u3iUn1AbxpoqGjo4XF7GNwBfdGbI2EARYA1TTojOSGA/edit?usp=sharing