Three common MySQL database design mistakes — PlanetScale

Brian Morrison II | February 13, 2024

Many years ago, I worked for a telematics company that ingested data from hundreds of thousands of devices worldwide. There was a point of incredible growth where we onboarded a customer that gave us a massive number of new devices and a huge bump in revenue. It was a great moment for the company's trajectory, but the increased amount of data being processed highlighted a massive flaw in our system.

The ID column of the data history table (which logged every event that occurred across all devices) was created with the INT data type, and it was quickly running out of space.

It wasn't an issue immediately, but if that column ran out of space, our entire system would come to a halt. Funnily enough, we built a quick tool called “the doomsday clock,” which would roughly calculate the date this would occur. Had we expected this, we would have designed the database with a different type that would have more easily accommodated growth like this, but of course, the results of our decisions are always more obvious in hindsight.

Let's take a look at this issue and a few other common database design mistakes when setting up your database.

Suboptimal data type

The scenario described in the intro of this article highlights the importance of selecting a data type that's big enough to accommodate your existing data, as well as any potential growth you might experience.

This applies to more than just numerical types, though. For example, if you attempted to write a string with 300 characters into a VARCHAR(255) column, MySQL would return an error and reject the write if it is in strict mode, which is the default setting for MySQL. If MySQL is not in strict mode, attempting to insert string data into a column that exceeds its length causes the data to be truncated, losing potentially important data.

Conversely, you can also select columns that store too much data. While this won't have as much of a negative impact as not having enough room, there are storage and performance implications with over-provisioning columns. Let's assume you have a column storing the US zip code, which is typically five digits. You could default to using INT for the column type (which stores a 32-bit integer), but you'd allocate far more storage than necessary. Utilizing a SMALLINT would be a better choice, as it stores a 16-bit integer and would be more than enough to store a zip code.

These are only a few small examples of selecting an inappropriate data type for your columns.

Missing or redundant indexes

Indexes in MySQL speed up data access by building a separate structure that's optimized to return data if the query's criteria match the configuration of that index.

Indexes are very important when designing a fast database. When indexes aren't utilized, any SQL queries that do not use pagination or provide a LIMIT will perform a scan on that table. When scanning, MySQL will start reading from the first row until it has found every row that matches the criteria. If you have a heavily used query on a particularly large table, repeatedly scanning the table can have massive negative performance implications.

On the other hand, you can have too many indexes as well.

Every index created will utilize additional storage, so having unused or duplicate indexes directly impacts how much you are paying for that storage. Whenever data is updated or inserted into a table with indexes, MySQL needs to update those indexes (along with their associated statistics) to ensure they are accurate regardless of whether they are used. This can be a time-intensive operation that can create a bad user experience.

Note

If you want to learn more about how to effectively use indexes, we have a whole section covering them in our MySQL for Developers course.

Improperly storing semi-structured data

Over the past 20 years, utilizing NoSQL to store semi-structured data has gained favor with companies that need to process vast amounts of data very quickly.

Plenty of dedicated solutions are available on the market for storing this kind of data. However, MySQL is actually very capable in this area as well. Most semi-structured data stored in a database is represented as JSON. The most obvious way to store this would be to store the string in a TEXT column, but this is definitely not the most optimal way.

MySQL has a dedicated JSON column type that is designed to store JSON in an efficient binary storage format.

Using JSON over TEXT has several key benefits. The first is that InnoDB, the most commonly used MySQL database engine, natively supports querying and filtering based on data within the JSON object stored in the column, removing the need to manually filter after results have been returned to your application code. MySQL also supports building indexes based on data within JSON. This enables fast searches, allowing you to return rows based on your queries more quickly.

Conclusion

Whatever happened to that ID column that was running out of space? Well, luckily enough, the column type was a signed integer, meaning we were able to reseed it to -2,147,483,648. This effectively doubled our ID capacity by assigning negative numbers as IDs and incrementing towards 0. It's not the prettiest solution, but it did help us avoid a rather large amount of downtime that would be required to update the schema for nearly all tables in our database.

Designing a database for growth is no simple task, and things can get out of hand in a hurry. We've touched on only a few potential database design mistakes here, but every MySQL use case is unique and has its own challenges.

If you've encountered design issues, tell us more on Twitter, and make sure to tag @planetscale!

推荐订阅源

Blog — PlanetScale

Suboptimal data type

Missing or redundant indexes

Improperly storing semi-structured data

Conclusion