What is Sharding? definition and advantages of this method of data distribution27 octobre 2018
Big Data brings to mind advanced database management techniques. Today we are looking at the Sharding. What does this English expression mean? What are the advantages of this set of techniques? Explanations.
If we repeat the etymology of the word Sharding, we could define it by this term by exploding.
For our part, when we talk about the technique of sharding, the first reference that comes to mind is the Harry Potter saga in which evil genius Voldemort separates his soul into seven pieces in order to survive at any cost and gain power.
Sharding: partitioning data:
Some Sharding principles come close to this dubious analogy. Indeed, it is a set of data distribution method that consists of separating, dividing Big Data databases into Data Set (or dataset) of reduced size in order to speed up their processing or manage more easily, or all at the same time.
Technically, it’s about separating the types of information in order to host them on different servers each having a database engine. This Cluster architecture allows web giants to more easily manage gigantic databases. This is particularly the case for Amazon. The E-commerce site will for example separate the data corresponding to the orders from that of the billing. In addition to gaining security, the processing of information will be accelerated because it requires less hardware power compared to a single server, while the cost of hosting will be reduced. The sharding is the very principle that governs the operation of the Blockchain, so cryptocurrencies like Bitcoin or Etherum.
Indeed, the maintenance of a single server requires more power and special care. It therefore usually costs more to a company because it must be equipped with high performance components and engage engineers able to maintain the system.
Sharding allows you to organize the distribution of data horizontally. Data shards, partitioned datasets can be distributed on servers with much more reasonable costs.
A complexity to anticipate:
Once this mode of partition understood, the sharding becomes interesting for companies which have difficulties to manage big data databases. On the other hand, the complexity of certain cases must be taken into account before starting.
Companies that want to geographically distribute their client databases can simply allocate one server to each defined zone. It becomes more complicated if the same customer is present in several regions of the world.
The complexity depends on both the number of data sets to be distributed, the type of data (structured or unstructured) and the architecture already in place. Some companies choose to adapt their database, while others develop NoSQL proprietary solutions. It must also be taken into account that the sharding makes it dependent on the interconnection between the servers. Some services can not be accessed if one of the servers is under maintenance.
Magento: Database sharding is splitting a database into multiple shards” or pieces.”