Tue Dec 17 2019Pokédex Builder

 

Github Repo

I decided to write my own minimalistic Pokédex. You never know you need until you need it.

A day ago, I published this repo, and it gave me the idea to scrape the necessary information from different resources to create my own database. Here is a sneak peek of how I structured the data.

def make_table(cleaned_table):

	true_table = limit_table(cleaned_table)

	pokedex_data = []
	training = []
	breeding = []
	base_stats = []

	# Addint information to Pokedex Data table
	for p in range(0, 14):
		pokedex_data.append(true_table[p])

	# Adding information to Training table
	for t in range(14, 24):
		training.append(true_table[t])

	# Adding information to Breeding table
	for b in range(24, 30):
		breeding.append(true_table[b])

	# Adding information to Base Stats table
	# Note: I grouped them up in four; header, base stat, min and max
	for bs in range(30, 54, 4):
		if bs == len(true_table)-2:
			total = [true_table[bs], true_table[bs+1]]
			base_stats.append(total)
			break
		stat = [true_table[bs], true_table[bs+1], true_table[bs+2], true_table[bs+3]]
		base_stats.append(stat)

	return pokedex_data, training, breeding, base_stats

Although the pokedex app is currently private since it's in development, I am sharing the scraping algorithm and current data publicly for everyone to use. Note that this is an ongoing project and the current data set is expanding regularly.

I will add a client file to create objects, access generic data soon.

Huge thanks to PokemonDB for providing the information.


Boring Bits and Some Footnotes

Naming

I spent hours making this scraper fully compatible with all the Pokémon. The reason some of them were failing because they are represented with special characters visually but have different name conventions in the official database. Although nearly all Pokémon have genders, some of them (for instance Nidoran) has different look and evolutions.

notion image

As can be seen above, Nidoran has gender characters and in their name. Although it is officially represented this way, I realised that Nintendo keeps Nidoran♂ as nidoran-f and Nidoran♀ as nidoran-m in their database. So, I had to follow their rules (but there are actually no rules) to slugify the naming on my own like this:

if pokemon == "Nidoran♀":
		pokemon = "nidoran-f"

if pokemon == "Nidoran♂":
		pokemon = "nidoran-m"

But this is not the end, because there is not an official rule to slugify these names, Mime Jr. is represented as mime-jr which is pretty accurate. Even though Mr Mime or Type: Null (I didn't know there is a pokemon named Type: Null) has a very similar slug convention, the Tapu Pokémon have some blank spaces casually left in their names for God knows why. Maybe something related to the Japanese names of these Pokémon.

# Extra caution for Mr. Mime and his bros
	elif "." in namelist:
		oldname = pokemon.split(". ")
		pokemon = oldname[0] + "-" + oldname[1]
	# For the Pokemon Type: Null
	elif ":" in namelist:
		oldname = pokemon.split(": ")
		pokemon = oldname[0] + "-" + oldname[1]
	# Tapu Pokemon
	elif " " in namelist:
		oldname = pokemon.split(" ")
		pokemon = oldname[0] + "-" + oldname[1]

Thus, I had to create a helper function to check manually if the iterating Pokémon has some special characters, if so I change the naming by myself.

# Extra caution for farfetch'd and his mates
	namelist = list(pokemon)
	if "'" in namelist:
		oldname = pokemon.split("'")
		pokemon = ""
		for chars in oldname:
			pokemon += chars

Making HTML Pages

To the precautions I've listed above, I've had to restructure the entire HTML pages of individual Pokémon before I scrape the pages.

def make_html(pokemon):

	pokemon = run_precaution(pokemon)

	# Reaching the page and getting HTML entities
	user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'
	headers = { 'User-Agent': user_agent }
	url = Request(
		"https://pokemondb.net/pokedex/{}".format(pokemon), None, headers)

	return urlopen(url)

Which was nice because then I could quickly find all the things I need from these pages with BeautifulSoup library.

soup = BeautifulSoup(html.read(), "html5lib")
	for span in soup.findAll('span', attrs={'class': 'infocard-lg-data'}):
		for name in span.findAll('a', attrs={'class': 'ent-name'}):
			# Pokemon Name
			name_dict.append(name.text)

Error Handling

Since we are talking about 900+ pokémon, the scraper is actually making 900+ requests for each of the representative element. Therefore, sometimes the code faces inevitable errors such as connection issues with both the user's server and the remote server. I found a silly way to deflect this issue.

# This if statement is my own way of error handling, please don't ask any questions
	if first_position == 0:
		move_table = {
			"move_level" 	: "move_level",
			"move_name" 	: "SOMETHING'S WRONG",
		}
		return move_table

I just do cmd+f in the CSV file and search for "SOMETHING'S WRONG". If there is an entry, I restart the data update.

Conclusion

I am surprised myself because the overall Pokémon database structured like a spaghetti. Developers like to overcomplicate things and try to think out of the box because that's what taught in school for years. However, in reality, some of the most successful products are there because someone just "made stuff".

notion image
 

Updates

21/12/2019

I now added a lot of useful information from tables. The csv file contains more than enough data for my app but I will try to scrape more in the future. At this time, I will create a client file to access the information easily for other people.

23/12/2019

I had a chance to update evolution levels. I will also add a table concerning moves learnt by leveling up. I have a plan to create a file, moves.csv, and add all the moves details later.

23/12/2019

It was a boring night, so I completed the moves learnt by leveling up table. The code is getting a bit complicated so I will separate some functions later on.

 

Stay up-to-date

Subscribe to my newsletter and stay up-to-date. Why not? It's free.