How to set up your data infrastructure

This is a story about how I set up data infrastructure within an engineering-focused organization, from a grassroots perspective.

Setting the stage: I had just started this job and was partially tasked with creating an app using Microsoft Power Apps to automate some administrative tasks. To a non-technical person, Power Apps is amazing. It's a drag-and-drop low/no-code platform where you can create and deploy apps. It has many of the things you require:

Now, that sounds great, you can deploy apps with minimal knowledge, not have to worry about them breaking, and you have an acceptable range of features to work with to make things like forms and basic CRUD apps. It's actually quite impressive how Microsoft managed to take so many varying concepts and make a product that is relatively easy to use.

And here's why you shouldn't use Power Apps: Power Apps is certainly another tool to use in the right case, but like its cousin Excel, if every problem is a nail, you shouldn't have Power Apps be your only hammer. We all have seen really smart people create really impressive Excel sheets. It is a tried and true tool, and people often reach for it, even as problem complexity increases. Power Apps solves many things Excel does. Imagine 20 people trying to add their name and info to a spreadsheet at the same moment; use a form, it's easier.

You all didn't come here to read about Excel vs Power Apps, and that's not the point of this post. However, the difference between the two and the hidden complexity between them serve as a really good lesson. Many of the challenges I faced when setting up infrastructure were human ones. People cling to tools they have spent hours learning and have used to solve many of their problems, and there is often a more optimal way of operating. Here's how I navigated that problem.

How to set up your data infrastructure

Determine your 5 Vs

Volume, Velocity, Variety, (Value, Veracity). Figure out how big your data is, how quickly it is produced, what datatypes (is it text, structured, etc.), the business impact of these data types, and the accuracy.

Figuring out where all your data is coming from and how it needs to be combined is the first step. Then, figuring out the technical parameters of the data is a great first glance at the complexity of your situation.

What infrastructure type do I need?

Figure out which you already have access to, and the thing you don't want to do is set up and create an entire Azure instance when everyone else in your company is using Amazon. Sometimes this is fine, but talk around and see who is using what. Likely, there are some resources available for you to use that are already paid for if you're in a big organization.

If you find that only Excel is being used, now you can decide based on your use case what aaS is best for you. Maybe there's already a pre-built SaaS solution? Research it. Power Apps is a PaaS; it has infrastructure managed by Microsoft, and provides you with a "Platform" to create apps on. PaaS provides developers with a "Platform" to build, deploy, and manage applications. A good example is Databricks, which gives users a platform for working with data, manages compute infrastructure, and storage for data warehousing. In this scenario, you develop the software and tools you need on a managed platform.

IaaS is paying Amazon/Microsoft/Google/etc to run servers for you instead of self-hosting and maintaining them yourself. You have to manage, maintain (the code, not the hardware now), and configure everything. IaaS also offers a lot of integratable services. This is an overly simple explanation, but the point is to show that the step up from Excel documents isn't straight into the deep end. There are levels, and you should choose the one that is right for you.

How I solved it

I had developed some useful Python tools, put them on our GIT, and they were being used in our office. We didn't need 10 versions of the same thing, and this was a good step to start standardizing. But now we wanted to start building local apps. It needed a database to be maintained, some UI elements, and was of a greater complexity than Power Apps was prepared for.

I explored many options: IaaS, PaaS, Saas, self-hosted, everything. I believe each step up the aaS ladder requires additional justification. If there is a pre-built SaaS that actually fits your use case, developing your own novel solution is probably more costly and consuming than the cost of the SaaS. In my case, there was free software, but I would have to figure out how to self-host it, and it didn't do exactly what I needed it to. Effectively, there wasn't one. As a note, if you need more than 1 SaaS to solve your problem, they probably aren't your solution. You shouldn't need to link together 5 different services.

We started to use a PaaS to develop since it managed many of the back-end services for a small developer team. In many cases, for a small team, this is the most approachable solution for data analysis on top of an existing system. It solved many of our problems, allowed us to deploy apps, and store our data. We even had leadership's support after they saw what we were able to accomplish and other people started asking us how we could solve their problems, One of which needed GPUs for some ML work and was preparing to purchase expensive hardware for the project. In this case, it made a lot more sense to rent GPUs for 100s of dollars for the length of the project than to spend 1000s for PCs.

The platform we were using was equipped for the complexity of the problems we were facing, and the 5Vs of our data. We were solving problems in our team and in others. We also had buy-in from leadership to continue.

Everyone did not leap to switch, however. Many people wanted to create apps like we had, but didn't. Some didn't want to transfer their workflow or learn how to use it. The technical part was much easier in this case than the human part; not everyone wanted to adopt.

Conclusion

The biggest lesson from this experience wasn't technical it was learning to meet people where they are. Instead of trying to convert everyone to a new workflow at once, we found success by identifying specific pain points that our infrastructure could solve better than existing tools. The GPU champion became our proof of concept not because they loved cloud architecture, but because they had a real problem with a clear cost comparison. From there, we could point to concrete successes when other teams asked what we could do for them. If you're setting up data infrastructure from a grassroots position, focus on building momentum through targeted wins rather than trying to revolutionize everything at once. Find the people who are already frustrated with their current tools or facing expensive problems your solution can address. Start small, solve real problems, and let success stories do the convincing for you.