K004: Storage – The Third Factor & Brain of Modern Infra

transcript

Episode Introduction

Kamalika: Till now we have covered the design called 10 factor infra, which is a framework for building robust modern infrastructure that is cloud ready. In the past two episodes, I have covered the first two factors which are network and system. In this episode, I'm going to cover the third and the most important factor of the 10 factor infra This is the brain in the anatomy of modern infrastructure. And this is called storage. Hey there. Welcome everyone. Thanks for listening to Cloudkata - mastering modern infrastructure. Learn how to design cloud ready modern infrastructure with zero downtime deployment, security and effective fin ops with me Kamalika Majumder. I'm the director and founder of stacks, a consulting services for DevOps driven modern infrastructure. cloudkata is available on all podcast platforms. So tune into the infra journey on your favourite podcaster and learn the art of mastering modern infrastructure. Do visit my website, www dot cloudkata.com. That's right, it's cloudkata.com and subscribe to the complete playlist along with the transcripts. So you get notified about upcoming episodes, and various supporting blogs and articles that get published. And if you have any queries or feedback about the sessions, connect with me on cloudkata.com. So let's continue our deep dive into anatomy of modern infra on cloudkata mastering modern infrastructure.

Intro Music Plays:

Below we have covered the design called 10 factor infra, which is a framework for building robust modern infrastructure that is cloud ready. In the past two episodes, I have covered the first two factors which are network and system. Network factor says that in order to achieve performance, accessibility, privacy and security, we need a well segregated network, perimeter security, a secure entry point and dedicated peer to peer connectivity. In the second episode, we discussed systems. We learned immutable and stateless computer servers should be built from version control machine images, standardised across all environments, and enhanced with auto scale out or horizontal scaling on demand. We also learned that regular validation of system hardening patching and upgrades should be done through configuration management systems. And we learned that prepaid hosting plans for servers is the most effective way of managing optimised finop so that you do not end up over spilling your budget. This episode, I'm going to cover the third and the most important factor of the 10 factor infra This is the brain of the anatomy in the anatomy of modern infrastructure. And this is called storage. The layer where your data is stored is the most critical layer, which decides what is being visible to customers and how you play around with your services. In today's session, I will start with the challenges and the impact of those challenges on your businesses, focusing on storage of data. I'll also discuss various type of data storage, and how you can have an effective fund ops so that your budget is well under control. I'll also touch base upon managed and self managed services focusing on data critical applications. In the last episode I discussed managed and self managed services for compute. In this episode, I will cover managed and self managed services for data. And it also covered various deployment models for your data stored on your infrastructure. And how can you use configuration management to effectively implement those deployment models. And lastly, I will discuss how to achieve data security on today's modern infrastructure. That is the cloud. So let us begin today's episode, we're looking into some of the challenges that we see in our data layer or in the storage layer. First up, or I would not say this is a challenge. It's more like a requirement is availability. Today's online world runs 24 seven and that requires the data systems to also run 24 seven Gone are the days when you can afford weekend downtime or off our maintenance windows. Customers can be Live from anywhere anytime, and with the globalisation of your services with the launch of services into multiple time zones, that the time window that we get gets even smaller. So, typically the SLA or slo, whatever the services defined, are growing smaller and smaller. And that is why the availability demand is one of the major demands, and also the challenging demands that today's infrastructure faces. And when we talk about availability, this is highly available. So, that is why a very important factor that comes into picture is replicating data across multiple sites. So, no longer standalone stateful systems can suffice the high availability demand. So, whichever infrastructure hosts data on standalone say stateful systems, they cannot meet the highly high availability demand that today's digital world holds. The second challenge or demand that the digital world is asking for is scalability, you might start with a limited set of data to begin with. But as your customer base grows, the demand of storing data also increases. So, you need to be prepared to scale out your storage on demand. Anytime it is required, similar to the system's scaling, here, data scaling also works at the same pace.

So, scalability of the database or more specifically, the storage area. And upgrading on demand is another challenge that people face for their data layer. The third and most important thing in today's online world is security. You will need to secure the data at rest, provide the right data privacy and comply with the inland red residency and localization policies. Many countries have very specific data hosting regulations for their companies, especially in Asia, if you see companies want banking and other customer critical data to be hosted within their boundaries, they do not want any copy of the data to be hosted outside, so secure, providing security to the data that is hosted on cloud so that no one other than authorised people are having visibility to it is one of the major demands of today's digital transformation. And that brings me to the fourth. And as I say last but not the least, the challenge is having a balanced cost or budget for your infrastructure. Like I mentioned in the previous episode, from the system layer onwards, you start getting charged for your infrastructure. And after you cross the systems layer or your compute layer, when you land up in the data layer, the cost shoots up. So if you do not have an optimised budget, you will not be able to meet the budget that you're looking for. And you might end up either overspending or underspending. And that also depends on how much of a data retention time window you're holding. So very important to have data retention policies, and data archival policies, because you can no longer buy a huge bare metal server and keep hosting the town to it. When you're on cloud, you'll have to think about how much you are spending on the data and is all that data necessary, or you can archive some of these and, you know, rotate some of that data. So I'll talk about how to mitigate these challenges in the upcoming points that I'm going to describe. So let's look into various types of storage that we have today, especially on cloud and what kind of data can be stored on that storage is classified into two kinds of data. One is data at rest and data in transit. Now, what does data at rest mean? data at rest typically refers to any data which is sitting on your system. This can be relational or any non relational database servers, which holds your user data. They can be your logs, your metrics, or they can be static assets like images, videos, documents, files, or even backups or something. shots and what is data in transit data in transit is the data which is which may or may not be located or stored on any server, but it is transmitted from one place to another. This can be any kind of file transfer that you are doing with your third party integration points, or things like logs and metrics that you are transferring from one place to the other. And even look at the storage level services that cloud providers offer today, they largely distribute their storage into two sections one is the database and another is the object storage. So, object storage typically refers to certain dedicated, you know, storage, where you can have static assets like images, videos, files, and databases give you the database tables, relational and non relational, based on your need. So, you will have to think about what kind of data you are going to host and have to choose the solution accordingly. Like in the previous episode, when I mentioned that, make sure your compute servers are always stateless. Now, when you come to the data layer, especially for static assets, you might have to host images, videos, documents, for your you know, things like in bank, you may today with the online banking system being made a norm,

or being the new normal video KYC has become one of the important tools that the banking sector is using for their customer onboarding. Now, all these video KYC or online verification is done via, you know, video recordings or images, and likewise, and all these are static assets. So when you have data like this, make sure that you use the object storage rather than having a standalone compute server with, let's say, 100 gb of disk mounted onto it. The reason I'm telling it is that these static assets, images and videos might look like that they are not critical. But in fact, they're super critical. These static assets hold the mechanism of validating your customers. And they are as critical as your relational data and undulation data that you're storing in your database system. So what do you do in this case, make sure that you use the object storage mechanism of the cloud. Now, what cloud providers are giving you is when you go with object storage, that object storage is spanned across multiple data centres. They literally have multiple hardware supporting your object storage to you, it might look like one Storage bucket where you're hosting your data. But it gives you that level of high level availability by copying the data across their multiple data centres or availability zones. So your high availability requirement gets taken care of by itself, it comes inbuilt with those object storage services. And when you go with the database servers, there are various options you have. And that brings me to my next discussion, which is managed and self managed services. In my last episode, I discussed the pros and cons of both managed and self managed. When it comes to database hosting, I would recommend always managed services because first thing is the database layer is super critical. And it needs, you know, more care than your compute layer. And if you're going ahead with self managed services, it might become an operational headache to do data replication. If the replication breaks at some point of time, due to any kind of visual connectivity problem, you will have to pitch in and fix it yourself. Rather than going with the self managed support and maintenance model. I recommend going with the managed services. So whenever you are on cloud, try to look into the managed services of course do not go with the public facing managed services but more like private facing managed services. And in my first episode I discussed how to segregate your network into public, private and protected. Now data sits in her private layer, which means that incoming and outgoing access to the internet should be blocked. I'll explain more on that in the security section. But in that model, when you are going with a VPC kind of architecture for your infrastructure, choose the managed services, which are VPC enabled, remember setting up you should always go with the VP Suitable network because that is your private network, you are not sharing network segments with anyone else. Or even if you have your own segregated and isolated private cloud, so many services give you support and maintenance. And they give you high availability because these managed services are spanned across multiple data centres. So always go with the managed services. But be careful about how these managed services are owned by the cloud providers, you might have to look into certain nbas with your cloud provider to validate that they are not saving a copy of your database anywhere outside the country. Now, some countries have very strict data localization policies, especially, let's say Indonesia, very strict data localization policies, data should not be leaving, especially customer data should not be leaving the country. So you might have to validate with your cloud provider and get into agreement and make them comply with it. And the other thing is your IP protection, so when you go with many services, you will also have to take care of it. The other benefit of going with many services is that it takes automated backups, and restoration capabilities. Now,

I'll talk about restoration versus replication mechanism and which is recommended. However, I would like to mention that even if you're going with a highly available, replication based service, today's auditors still ask you if you have regular backups. Because if the cloud provider is not having multiple regions within the same country, that means if they are all localised in the same city, and that is true for most Asian and Southeast Asian countries, it's not like the west where Google or AWS has spanned across multiple cities in the country. In Asia and Southeast Asia there are very limited countries, which have multiple cities hosted, you know, given as regions in the cloud provider. So in those cases, auditors do ask you that? Do you have an automated backup? And how regular Do you take backup? Do you test the backup? I'll cover more about that and what you can do in those cases. But the point I'm making here is to always go with many services, high availability options, and backups enabled. Because you may have to just show it for your audit and regulation purpose. And so once you have chosen your storage type, object storage database, whichever you choose, the second, the next thing you will also have to be careful when you're doing a D choice is is your fee knops cost is a big matter of concern when it comes to the storage layer. Now, you might think that, you know, beginning of any year, I will go and sign up for a certain amount of storage and I'll be good to go with it. And somebody may think that No, I'll just start with say 100 gb or 50 gb that is my data size and I will be good with it, but you never know when you need to scale out on it. So, one thing that people do not realise like in the compute layer, even in the data layer, if you go with the commitment that your data may reach up to this maximum level during that one year, you will be saving money. But if you think that no, I will start with less and then extend that way you will end up spending more because that way you will not be able to leverage the subscription model or the commitment model which the cloud provider gives you. So, always think how much is your business trying to target in that particular year? How many user

is it

5000 10,000 1 million and how much of the data will generate right and multiply it by two or four and then opt for that kind of storage. So you can always plan for it. If you do not use use it actively you will not be charged for it but at least have that reservation in your budget and always go with the subscription model especially for your database services, even for service like Redis which is which stores data you do not believe that the comparison between you can save 1000s of dollars between prepaid and postpaid model. So live computer data systems also always go with a prepaid model because that way you will Be able to save a lot of money and handle your finances effectively. Without getting bothered about how much I'm spending, I don't have enough money, but I need to scale out and things like that. Now, let's see, once you have chosen which model you are going to go, the next thing you would want to look into is what are the different deployment models that are available. Now, when you're designing your database or a storage system or services, there are three kinds of deployment model multiple deployment model actually, one is based on mirroring or replication that means, you will have a stretched cluster of servers, where it may or may not be visible to you, but you will have a replication of your data across multiple data centres are in the cloud or terms multiple availability zones. And the second thing is you may have an active and hot standby. So there that you may also see certain places wherein you are actively writing to one reach one zone, whereas you have a copy of the data in the other room these are true for typically in a D CDR kind of setup or active hot standby kind of setup. Now, based on how much you can spend, and how much you can, you know,

plan,

you will have to choose which model satisfies your requirement. However, the best model is of course, the active active, or some one may even call it in in the deployment. While you can call it the blue green deployment. That means that you always have all the stuff all the servers up and running. So you can do maintenance without taking the whole system down. Now, whether blue green deployment model is possible for database systems or not, well, not 100%. But still, you can get very close to it by making sure if especially if you're using database servers, it hardly takes few minutes when your database is getting upgraded. Provided you have followed all the guidelines. But when it comes to the storage layer, it is always advisable to stop any kind of rights when you are doing any kind of upgrades. Because no matter how much of well managed service you have taken, there is still chances of data corruption, when there is a machine shutdown on machine restart going on and somebody's trying to write data on to disk, you will not forget that cloud might seem like a transparent black box to you. But in the backend, it is actually running a number of servers to host all these applications. Now the next model of deployment is integration. This is like your application may be service specific or data specific. For instance, you may only be hosting the applications whereas the data might be and your own application. But you might also be consuming data from a third party, let's say if you are a bank, you might be consuming data from a core banking system sitting sitting sitting somewhere else. And then you have your own application and your own application data. So, this is another model, wherein you have multiple sites or more like a hybrid cloud model, which is based on integration of your application and your data centres. And this you will also see typically in fintechs or in banks or in healthcare, or in countries, which are very strict data localization policies, wherein they do not want the data to be hosted in in cloud, let's say in India, you will see very less bank, which are in cloud. Most of them are in their own private data centres. However, they are moving towards it in some places, you will see that especially people who are using mobile banking applications, they might use the mobile banking services on cloud but the data still sits on premise. And they still have that satellite office or a data centre on premise where they're hosting the core banking data. So, in that case, you will see an integration based deployment model. And the second and the third thing is more around you know, host specific as I mentioned, if you are on cloud, you can have either zonal database system that is you can have a single availability zone or a multi availability zone. And then you can have a regional one wherein, it is not just a multi availability zone, it is also regional across cities. So, zonal gives you high availability within the same city. That means your data is hosted across multiple data centres within the same city. Regional expands that high availability to another city. So, if there is any natural calamity in that city, you still have another city where your data is hosted. And it is very important especially in Southeast Asia which is prone to natural disasters like earthquakes, tsunamis etc. So, a regional based model is helpful in those scenarios, then of course, there is the traditional or legacy localised system wherein it is hosted on self managed virtual machines on servers. These are pretty much the deployment models that you have. Now, what is the recommendation for deployment based on which domain you're know, and based on what regulation or compliance you will have to certify to you will have to choose the option. However, the best option so far is the active model, which is based on mirroring and replication that means managed service, which is enabled with high availability spread across multiple availability zones, enabled with all the advanced features for maintenance and upgrades. And

try to see if your regulator actually complied with most regulators comply to it, they, they mostly care about two important things one is where is the data hosted and how much of an availability are committing to a customer, they just want to ensure that none of the none of the customer data is compromised or lost. So, replication model, and many services really, and if you have used a regional option, that will give you a true highly available data layer. Now, we now have all these design principles that you have, you know, adopted. One very important thing is to have configuration management like I explained in the system, it is also equally important to have configuration management for your data layer. For instance, if you are creating tables, or if you're uploading data or creating users, make sure it is always done using a very good configuration management tool. Or at least you automate it and have that version control in your source code repository. And make sure that the data is the copy of the data that you are producing, or the query that you are sending you also version that because in that way, you can correlate the version of database configuration which matches the version of application

Once you go live with your application version, one after that, everything that comes in is actually changes in data.

You might have new features that you release on your application side, but the most frequent changes that you observe will be on the data, there will be different columns, different models that you will be uploading based on your expansion of business or user eligibility. So, having a configuration management for your database migration or upgrades, using a version control system and version database will help you to track what change was done for which feature or which feature correlates to what change in data. And configuration management is also important when it comes to database expansion, disk expansion. Now, one there is one particular data that still holds true in some special cases which is like files and folders being transmitted through SFTP servers. I know this is a very old school mechanism, but there are still traditional organisations, which only use SFTP. They are not in the cloud as I mentioned in most countries, especially in the Asian region, especially banks and fintechs you will see less organisation than lesser organisations on Cloud. In those cases file transfer is being done using SFTP servers. Now when it comes to SFTP kind of data file, raw files being transmitted, you really have to, you know, sometimes extend the disk that the data is stored in. So in those kinds of situations, if you have a configuration management system, it is easy to upgrade it and also to have it in your source control repository through your pipelines. So you can track what change was done. And it also helps you to let you know your regulators approve that every change that goes into your system is being tracked and traced and properly validated. So configuration management systems are not just important for replication and sending compute layers, it is also very much important for your storage level systems, B databases or your static object storage. That brings me to the last feature of the storage area, which is very important: security, security of data at rest as well as data in transit is a compulsory need for any kind of organisation. Until a few years back, it was only critical for banks or, you know, fintechs, or healthcare. But now, it is an important requirement for any business, which is on cloud, especially which is on cloud, then then what comes under security of data. The first thing is encryption. So the data which is stored in the cloud, as I mentioned earlier, is not rocket science, it's actually someone else's computer or someone else's server. So someone procured a huge amount of servers and data centre, and they automated the whole setup, and gave you just the interface to host your application. That's In short, the definition of cloud. And then when it comes to the managed services on cloud, that's pretty much the same thing. So even when you're hosting your data onto a virtual service, or a managed service, don't think that the data is hosted, don't think that there is a bare metal server, which is dedicated to you. It is actually in most cases, unless you go with a physically did, you know, visit a physical server or dedicated server, the data is hosted on some virtual machine, which is on some bare metal server, which also hosts some on one else. And since managed services, since these managed services are handled by the cloud providers, the cloud provider will have visibility to the data or the raw data that you have. Now, what you can do to make sure that nobody is suddenly going ahead and copying your data and then passing it on to the dark web or, you know, you would have seen incidents like that, that somebody hacked into a system and copied all the data and sold it on the dark web. Now, how can you prevent it? Now, the first thing is remember in the network session, I told you that we need a well segregated network, we should not keep all the layers public private protected into one single subnet, that is very risky, because the public facing web server can get compromised. And that might lead into compromising all the other servers because they are lying in the same segment. So the private segment is typically meant for the data layer. And that is why first thing you will achieve through the if you are following the network,

you know design principles that I mentioned, you will first thing you'll achieve instead the network level isolation of the data. Now, even if somebody breaks into one of the web servers, they will not be able to go directly into the data server because the data server allows traffic only from another layer, which is protected by multiple hops. So really security, if you look at it, is like creating multiple, you know, locks on the way of hackers so that it prevents him from easily getting into it. And then to get yourself notified that somebody is trying something fishy. So what you can do once you have hosted your data layer into a private subnet, is that the next thing you will need is that encryption of the data. And so what that means is if anyone is you know, not one thing, that cloud provider do commit to you that they will not have any root access to the database server, they will have maybe physical access to the data centres, they might have access to the hard disks and you know, servers, but they will not have any logical access into database. So now, let's say somebody copies the whole hard disk or takes a snapshot of it, and he tries to bake into it, but he will not be able to because he has the data on the disk encrypted. And now a very important thing is, when you're encrypting the data, there are two ways you can store the decryption key. You can either store the decryption key with the cloud provider, they also provide a key management system. Or I remember this one of this one of my clients a couple of years back that there was a debate going on that yes, we are going on cloud we have encrypted the disk, where is the decryption key so should we approve kms system as a place to be stored or should we store that key with ourselves? You know, like internally. Now that is another thing again, it depends on how much of Trust agreement you are having with your cloud provider and have you had enough legal measures in place, if suppose a cloud provider violates these norms, you will have to take care of those processes and compliance needs. But yes, the foremost thing is encryption of data. at rest, this comes for your database servers, or services like RDS or any kind of SQL databases, or your object storage where you are storing your files folder, don't think that if it is just images, it is not critical, just encrypt it, because it does not cost you anything. This comes with an enhanced feature. It's not like if you encrypt it, it will add another extra cost to you most data services give this feature as an add on service. Because that way, you will at least have some safeguard to your data which is hosted on cloud. And as I said, the data classification case key should be well owned and managed either by the cloud provider, however, owned by you and managed or stored at the cloud provider, or you can have it fully owned and stored within your premises. Right, you should have a very strong user management and secret management system. Don't make the same user for the same service, right? Make sure that you have a segregated service account for your user. And make sure the secrets are stored in some sort of vault. And don't use just one DB name, db username and password for all of your databases or all of your service divided based on roles. So role based access control

should also be implemented for the databases. So these are pretty much in summary, what you can do to secure your data at rest on Cloud. Now to meet the data privacy and confidentiality and data sharing policies. Now, another kind of data that we quite often ignore are the logging and monitoring data. Remember, with the evolution of containers, and more and more organisations adopting containers, when and it gives you a very easy way to collect all your logs from your containers, roll logs, and it also leads you to format your logs and look into them. But sometimes, you also have PII data within these logs and matrices. So if your logs and matrices have user account numbers or you know passwords and things like that, and you are using a managed system for your logs and monitoring data, that also becomes critical. So when you're using a system like that, look into the data sharing policies, again, the same way as you would do for your static assets, or your user database, give the same level of importance to logging and monitoring data. And we'll talk more about logging and monitoring and what can be done for the data stored for it in the upcoming episodes. But I just wanted to touch base on the all the areas that you should take care of when you are designing your storage

level services.

So, this In short, covers the configurations that you should be taking care of for your data layer or your storage layer. And I would like to summarise on what you should be taking care of while designing your storage layer. First thing is make sure that you follow a replication mechanism unknown not restoration, replication, not restoration. Why because replication gives you active availability, whereas restoration is an active standby model. So that means in restoration, you will need downtime. And that can be anything between five minutes to a number of hours. And in today's scenario, you cannot hold your user to say that no you cannot do any transaction until my database is restored. So a replication model is the best model to achieve high availability for your databases. Second thing is a version control database config and data as I mentioned earlier, use a config management tool to make any changes in your database layer like you would do for your applications deployment. Do the same thing for your databases and always have it in a version control system so that you can track and trace what changes went at what point of time and what was the, you know, requirement for that change. Third thing is, as I mentioned earlier, do not use computer services for data storage or object storage. Compute servers are meant for computers; they are not meant for heavy data, right and rates, though the services will tell you Oh, we can say containerization systems will tell you we can mount on days you can write to the disk. But those mount points do not give you high availability, you still have to replicate the data into multiple disks. So if you have a Kubernetes cluster, and if you think that you now have a persistent volume mounted and you can instal your MongoDB server on it, do not do that, because you can have the MongoDB service store data on one persistent database. Let's say you're running one container, which has MongoDB and it is writing to one pvx. But who gives you the high availability the replication for that you will have to again instal another service to replicate and it is not foolproof, there will be data corruption and issues and downtime involved. And worst case is that it will make the whole computer server prone to a downtime, you cannot achieve zero downtime deployment for your compute services. If you end up hosting your computer specific, let's say your micro services and your databases onto the same Kubernetes cluster. Even for application deployment, you will have to take downtime. So do not use compute services for data storage only always use the data services for data storage and always object storage because to store static files and you know assets, because that way, you will be able to you know get the data replication capability because object storage is usually a regional service, which means that it is spanned across multiple availability zones or region, it is not specific to one easy or to AZ so when you when you are uploading data into your bucket, it actually is highly already highly available, it's in your account already. And if you have situations like SFTP servers, you can also mount these object storage as a mount point to your SFTP server. So you can run the application on the server whereas the right right your your actual files are in the object storage, which can also be encrypted and be sorted through a UI if you have a team operational team, who are not technical enough to go to the terminal, you can always use the object storage for the UI. One thing the other thing is you need to have encryption of data at rest and manage or maintain the decryption key is make sure that it is properly owned and managed. The third the last person not to not the list is that backup the data not the machine I've seen many organisation to meet compliance in demand they just backup the whole database system or even if they have a highly available replication mechanism, they still backup the whole system that does not help because the first thing is the system snapshots grew in size and in turn, end up eating a lot of money that you will spend plus these snapshots are not reliable more many very, very often it has been seen that data inside the snapshot gets corrupted and you are not able to recover it no you know properly. So if you still have to take backup when you are using many services, you can enable backup which actually takes backup of your data that is told and not the system.

So if you have a requirement that you will have to have a backup so always backup the data, not the machine. So this is pretty much what is the summary that means replication not restoration, version controlled config and data. No, don't use a compute service for data storage and object storage, backup the data, not the machine and encrypt the data at rest. I hope you liked today's session and it helped you get more visibility into the anatomy of modern infrastructure. So far, we have covered the core layers for modern infrastructure that is network system and storage. From the next session onwards, we will start discussing the operational factors that are needed to manage and administer these core layers in order to achieve a cloud ready modern infrastructure. So join me back next Friday on Cloudkata, where I share more katas for modern info. With that note, I would like to conclude today's episode, subscribe to the show on cloudkata.com I repeat cloudkata.com if you have missed listening to the complete episodes, don't worry, you can get the transcript on the podcast page on cloudkata.com this is your infracoach Kamalika signing off. Enjoy your weekend and take care stay healthy, stay safe and keep learning

Transcribed by https://otter.ai

Other episodes

Leave a comment

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments