This blog post is targeted to engineers who are new to system architecture and design. Those who are recently promoted to the System/IT/Cloud Architect role and would be responsible to design a robust, scalable, secure, performance and cost optimized architecture would be benefited from this post. However anyone interested in the subject could find it interesting and get some benefits out of this article.
I will be explaining how I take complex problems and tasks and take the first step to start designing the architecture.
The dictionary definition of an Architecture:
Architecture is the art and technique of designing and building, as distinguished from the skills associated with construction. It is both the process and the product of sketching, conceiving, planning, designing, and constructing buildings or other structures.
Since I am an IT architect, my focus would be to create IT Systems Architecture. It’s not anyway related to designing for constructing a building etc.😊
Architecture is defined as:
- In reference to computers, software or networks, the overall design of a computing system and the logical and physical interrelationships between its components. The architecture specifies the hardware, software, access methods and protocols used throughout the system.
- A framework and set of guidelines to build new systems. IT architecture is a series of principles, guidelines or rules used by an enterprise to direct the process of acquiring, building, modifying and interfacing IT resources throughout the enterprise. These resources can include equipment, software, communications, development methodologies, modeling tools and organizational structures.
What do you think, designing an architecture is like getting into a tool like visio, diagrams.net(draw.io), lucidchart etc. and start creating the flow diagram straight away on the first day itself?
The answer is absolute “No”…right?
The first step would be to understand the business or technical problem that your client is willing to eliminate using your help. What do you have to do to understand the problem?
It’s simple, you need to ask lots of questions. You need to keep on asking questions to your client or other relevant stakeholders, till the time you are in a position to understand the problem or requirement and identify the starting point i.e. the point to start the deeper discovery phase.
Ask questions and capture the answers in a document or spreadsheet. Once you are satisfied with your questions and answers, prepare a checklist that could be used to go deeper into their environment to understand it better. Trust me, the questions will never end i.e. answers to every questions results in raising another question.😊
It does not mean that you are going to continue asking questions forever without taking any action and instead willing to design the architecture in a single go after collecting all the information like a big bang release or deployment. No, that should not be the intention.
At some point when you have gathered sufficient information, you need to take the first step of actually starting designing the architecture.
Having said that, my own customized definition of designing an architecture is “Gather information by asking questions thereby understanding the problem or requirement and then keep on recording it in a document or spreadsheet. Once sufficient information is collected and you understand the problem or requirement, start depicting it in a pictorial or drawing form, considering all the best practices prevailing in the industry globally“.
I am a big fan of the book “Start With Why” written by Simon Sinek hence I love asking questions. I encourage all new IT professionals to read this book at least once. Trust me, it will definitely help you in some way to focus your vision to ask the right questions.😊
Designing an architecture in the IT world is an evolving process that’s not going to end ever till the organization exists and continues using the IT environment. Hence there is no single final architecture that is flawless and complete from every aspect. Having said that, trust me you need to keep on asking questions even after the architecture is deployed physically and in operation. Either you or somebody has to keep on doing it in order to keep the architecture evolving and operational in the current dynamic i.e. changing IT environment. Information Technology is itself evolving even if it is already reached at its advanced stage.
Let’s discuss now what type of questions to be asked during the initial phases of the conversation. Try to ask more generic questions at this stage to understand the problem in its broad perspective. The initial conversations are mostly happening with senior members from the senior leadership of the client like CxOs, Directors, IT managers etc. hence most of such conversations might have already been done by your client leadership team who are responsible for procuring the project. In such a case, you can forward your initial queries to your client leadership team. They might have the answers already. Else they will either forward the questions to the respective stakeholders from the client or they will connect you to the client and then you can ask your questions directly. Most probable questions to be asked but not limited to:
- What are the exact problems you are trying to address from this engagement with us?
- What requirements do you want us to fulfill?
- What are your expectations from us?
- What type of IT infrastructure do you currently have i.e. on-premises, on cloud or hybrid?
If the engagement is related to any kind of digital transformation or IT modernization or any kind of green field project, the above questions will cover them all and provide some broad idea for the next set of questions to be asked.
Let’s take an example: Let’s assume that the client responded with the following sample answer to the above questions.
“Our existing IT infrastructure is running in hybrid mode with on-premises setup and also we have a public cloud setup. We have a very secure network integration between the two platforms. We are running many legacy applications in the on-premises setup which we wanted to decommission soon. We would like to identify which all legacy applications could be migrated to the cloud and how we can replace those applications with newer platforms which could not migrate at its current state. In addition to the said legacy on-premises setup, we are running our older legacy web applications running on the virtual machines in the cloud which are not performing well and also not scalable. We want to be able to use the application seamlessly as a user when there are lots of other users also using it. Hence we have decided to engage with a vendor who can help us to resolve the problems.”
I believe the above answer is quite informative in this case and provides sufficient indication about what all further questions to be asked. There are two areas to be looked into.
- Modernization and migration of the legacy on-premises setup to Cloud.
- Refactoring, Rebuilding and Replatforming the existing web applications that’s already running in the cloud to enable scalability and optimizing it for performance.
Considering the above two points, we can ask the customer which one is more critical to the business and to be addressed first?
Lets say, customers responded with the answer that they want us to take the 2nd point first because the web applications are catering to their direct customer facing ecommerce platform. Hence, the poor performance of the application is impacting their business.
Now we have a clear idea about where to focus and what are the next set of questionnaires to be asked in order to discover the existing cloud environment for better understanding. At the same time we also need to start discovery of the on-premises environment because it could be possible that their existing application which is running on the cloud is in some way communicating to the on-premises setup and who knows it may have a hardcore dependency as well.
One more crucial question to be asked before going deeper into the discovery phase and that is about the application itself. Since I am an Infrastructure Architect, my focus would be restricted to the designing of the suitable platform that’s going to host the newly redesigned application. In order to design such infrastructure architecture, I need to understand the functionality and working of the application itself. How will you get that understanding?
I need to connect with the developers who are actually responsible for the development of the new application. I then need to ask them the following questions?
- What strategy are you going to use to modernize the application? (e.g. Refactoring, Rebuilding and Replatforming)
- Are you going to refactor the application code maintaining the same functionality and host it in the similar way as they are now i.e. on Virtual Machines/Instances on the cloud?
- In case refactoring does not work for any reason, are you going to rebuild the application using newer tech stack and framework?
- Are you going to rebuild the application using distributed microservice based architecture that has to be hosted on any container orchestration platform i.e. Kubernetes? (This involves a kind of Replatforming as well because we are going to host it in a newer platform like Kubernetes)
- If the new application is going to be microservice based, then let me know how many microservices are to be deployed?
- Please categorize the number of microservices with respect to front-end, backend/middleware?
- What type of relational databases are to be used e.g. mysql, postgresql, mariadb, oracle etc.?
- Are you considering using a managed version of the above databases in the cloud or willing to host your own databases and then manage it yourself?
- If there is a need to use some kind of nosql database?
- How will the service to service communication be handled? (i.e. using rest api based http calls or planning to use some kind of asynchronous communication mechanism like kafka etc.)
- How many APIs are exposed to the external world.
- What would be the DevOps CI/CD strategy to be followed and what tools to use?
If we will be using Kubernetes, then obviously we have to use Ingress Controller to handle inbound web traffic. In that case, we may need to use path based routing. Hence we need to check with the developers how many microservices are to be exposed through Ingress controller to receive web traffic from the public.
In case any of the specific APIs or multiple APIs to be exposed to the external world as per the application design, then we have to consider some kind of robust and secure API management service.
In addition to the above points related to the application, there are many other points to be discussed like how to handle egress traffic, networking etc. However before going any further, one of the most crucial questions to be asked to the customer about cloud service provider selection (It may be Optional in your case). We need to ask the following questions:
- Are you considering continuing to continue using the same CSP or you have plans to migrate to a different one?
- In case you want to migrate to a different CSP, then let us know the reason for the same?
- Do you need any assistance on comparing CSPs that helps you to make the final decision?
In case they need any assistance with CSP comparison, we can refer to my other blogs about CSP selection considerations and act accordingly.👇
Along with finding information about the new application to be developed or it may be already under development process, we also need to gather their existing IT Infrastructure information (this includes both on-premises and cloud) as part of the Discovery process. The earliest we get it done, the better it is for us to continue with our architecture designing process.
You can use a template similar to the below sample spreadsheet for Infra discovery (Customise it as per your requirement):

After collecting all the relevant information captured, we can now start designing the architecture in a pictorial or drawing form using a tool like visio, diagrams.net(draw.io), lucidchart etc.
A sample base architecture might look like the below diagram for a microservice based application using managed Kubernetes service on azure i.e. Azure AKS. The architecture should be designed in such a way that there should be a provision to expand it to add more resources if necessary in future so that we can leverage newer technologies like IOT, Bigdata and Data Analytics etc.

Once it is created, you need to present it to the customer for review and approval along with supporting explanation documents.
Note: Security is one of the most critical components to be considered while designing the architecture. You can refer to my blog for more details regarding points to be considered.👇
I would like to stress on the point again that designing an architecture is an evolving process and it has to be designed on the basis of all the relevant information gathered as per the above discussion and also we need to follow the well architected framework defined by each of the CSPs. The Well-Architected Framework defines various pillars that drive architectural excellence at the fundamental level of a workload.
As an architect we have to design the architecture in such a way that it can be expandable to add more features without redesigning it completely. That is, evolving does not mean we have to re-design the architecture during any future modification or while fixing any error/bug at a later stage. It has to be created in a modular way to the extent possible, so that fixing one component could not impact other related connected components. By expansion, I mean newer features could be plugged into the existing architecture with too many modifications.
While designing the network setup for the architecture it is recommended to use the layered approach i.e. using hub and spoke model and the provision to handle the north-south traffic and also the east-west traffic in a better way. Both hub networks and spoke networks are respective secure private networks on the cloud i.e. VNet (Azure) or VPC (AWS/GCP). This helps in the expansion of the architecture easily in future. North-south traffic typically refers to communication between devices inside an organization and devices or services outside of the organization, such as the internet or cloud-based services. East-West traffic between internal VNets/VPCs within the organization using VNet/VPC peering connections. East-west traffic refers to the traffic that occurs within the network/organization.(Refer to the below diagram👇)
Network Architecture & Security Design (Ref diagram taken from Microsoft Azure Documentation):

Pillars of the Well-Architected Framework:
- Operational excellence
- Security
- Reliability
- Performance efficiency
- Cost optimization
- Sustainability
I would highly recommend to go through the respective Well-Architected Framework documentation link provided by each of the three CSPs below and understand them properly before starting the actual designing of the architecture.👇
https://docs.aws.amazon.com/wellarchitected/latest/framework/welcome.html
https://learn.microsoft.com/en-us/azure/well-architected/
https://cloud.google.com/architecture/framework
Once the architecture has been created, reviewed and approved by the client, you need to create a cost estimation sheet mentioning the pricing of each of the cloud resources used in the designed architecture. Then you have to share it with the customer so that they can review it and check/compare/map it with their financial budgeting defined for the project internally.
Where will you get the pricing for the cloud resources?
Each of the CSPs provide an online cost calculator that can be used. Refer to the below links of the cost calculator of each CSP below.
Click on the below hyperlinks👇
What next after the design and cost approval from the client? Do you think that your job has been completed and you are free to go?
The answer is absolute “No” because the next questions raised are:
- Who is going to implement the architecture physically?
- Do they have the capability to implement architecture independently?
- Do they need any kind of assistance from the architect while planning for implantation?
- What strategy are they going to follow during implementation? (i.e. Manual implementation using console and CLI or automated pipeline based deployment using some kind of IaC to be followed)
- What could be the landing zone configuration?
- What could be the Authentication and Authorization mechanism to be followed?
- Network planning for the implementation? (i.e. IP Addressing, VNets/VPCs, Subnets, VPNs etc.)
- How many private endpoints and private links to be configured?
- In case they are using IaC, do they have to develop the code modules like Terraform modules themselves?
These are some of the questions which might have been raised after the design and cost approval. Hence you as an architect should be available throughout the project journey and assist the implementation team from time to time with your expert advice, suggestions and recommendations wherever necessary.
The design you have created above is a high level design, hence you might need to break it down into feature based smaller components and create low level designs/diagrams wherever necessary. Low level diagrams contain more details about exact communication workflow along with details of the endpoints like IP Addresses, Subnet IDs, Hostnames, Labels, any UUIDs etc. Low level diagrams helps implementation teams and other stakeholders to understand the architecture and its communication workflow in a better way.
I would also like to highlight one very important point that the architecture design we have created is based on the information gathered during discovery, our understanding and knowledge and after considering the best practices followed in the industry. It’s still just a design and hence there is no way to test it to validate if it’s going to work flawlessly after implementation. We could be in a position to validate it only after the base implementation of the architecture. Actual validation and testing could be performed only after hosting the new application on the new platform in a real time testing and QA environment. In case of any error, bug or flaw identified after implementation, we have to take corrective action to fix it and also we may need to perform small changes/modifications to the architecture that we have created.
It is recommended to create a kind of POC in a sandbox environment before its actual implementation if possible and make the correction then and there itself. This helps in early identification of any kind of flaw, error and bug to some extent before its actual implementation to the Test and QA environment.
I believe this is sufficient discussion in a single blog post. Instead of stretching it further, lets end it here. I hope you will find this post informative and could be able to take away some good and valuable information which might help you in some way in your career.😊
Happy learning and knowledge sharing!👍





Leave a comment