How To Use Data Engineering Projects To Push Your Skills To The Next Level

Bob Wakefield
Data Driven Perspectives
6 min readFeb 12, 2022

--

These days it seems like every online coding class has “real world” projects for you to complete. The projects that I have seen that are available are not impressive by my standards.

When I created Fundamentals Of Database Programming, I wanted to make sure I didn’t create the standard substandard projects that other instructors were giving their students.

Those of you who continue to profess a belief in Udemy will receive the standard substandard training, which will result in your eventual elimination.

In my opinion, as a data engineering apprentice, having projects that mirror the real world is critical to your professional development. This is especially true if you haven’t landed work yet.

There might be a gap between when you finish training and when you’re hired. Use this period to expand on the information you learned in class.

Projects should come AFTER you have completed the class. You’re not even allowed access to my projects until you have completed the class and demonstrated mastery of the material.

My goal is to produce fully qualified entry level data engineers. Project work is part of that educational process. All throughout DENG 100, I teach from the perspective of the real world. However, most of the early homework is simplistic as it should be. Even the later more challenging work is often done in a perfectly clean environment and queries are written in isolation independent of larger systems which is certainly not real world. However, I do these things for pedagogical reasons and make the appropriate tradeoffs between real world scenarios and practical teaching tool.

However, in the real world, data is messy. Environments are messy. You’re not going to get a lot of direction on projects because you’re a fully qualified data engineering professional. You’ll be expected to use your skills and intelligence to synthesize solutions in uncertain environments. That’s the job. My projects reflect this reality.

My projects, which I call data engineering challenges, have the following characteristics:

1. They are literally real world. It’s either a replicated version of something I’ve built for a client or part of a project I’m currently working on. The really interesting ones are projects that are problems that I haven’t cracked.

2. They are projects that exists within a business context.

3. They are structured like the business requirements you’d get on the job.

4. There are NO specific instructions that tell you how to do things. You have to build the solution using your unique approach.

5. They go beyond what is taught in class. Just like in the real world, you’ll have to use resources like Google and Stack Overflow to get the job done. Many challenges come with additional training.

6. They range in difficulty from beginner to legendary. Some can be done on your local machine. Some you may want to spin up a cloud service to complete. Some might make you question your life choices.

DENG 100 vs Those Other Guys

Below is an example of one of my data engineering challenges.

Project Description

Automate the acquisition of a SFTP server public key fingerprint

Doing the work necessary to determine if your SFTP connection is secure without having access to the host key is annoying. The process can and should be automated.

Business Requirements

Create a process whereby you automate verification of a host key based on its fingerprint.

Project Resources

The following information may help you in your task.

Test SFTP Server

Python — pysftp / paramiko — Verify host key using its fingerprint

I never solved this problem. Good luck. We’re all counting on you.

So how DO you use projects to push your skills? By doing the following.

1. Get your GitHub repo set up and put all of your work in there so it can be showcased later. The goal here is to create a portfolio of work.

2. Make sure you finish the coursework before you tackle any project. Remember, a project is something you do AFTER you’ve trained.

3. Make sure any project you do has the characteristics I laid out above. If the instructor is claiming real world, make sure it is indeed real world!

4. If your instructor offers projects that do not meet the real-world standard, there are many projects available on the internet for sale and for free. Practity is a purveyor of such wares.

5. Start doing projects and KEEP doing them until you feel like you’ve seen it all. This is key. If you don’t have a data engineering job yet, you should certainly be doing projects in your spare time.

6. If you DO have a data engineering job, KEEP doing projects. The problem with one particular job is it only exposes you to the problems unique to that organization. Those problems may not be the same elsewhere, so you want to make sure you keep getting exposed to unique problems.

7. Start to winnow your project portfolio down to the most badass ones. Pick the ones that are hard, complex, and you are super proud off. Delete the others out of your repo.

8. In addition to blowing away unimpressive work, pretty up your GitHub profile. Fill it out as completely as possible. Pin your best work to the profile page. For an example (that currently isn’t that impressive), you can see mine here.

9. Once you feel like you have a respectable group of projects, clean up your repos and put the link to your GitHub profile on your resume.

10. Here comes the really important part and I’m going to break it out of this listicle because it’s complex.

This is the power move.

This is why you want to have a portfolio of publicly available work.

In the interview process, you may be asked to take a coding exam or a take home project. These things are insidious, a waste of time for everybody involved, and usually nothing more than a selling point for recruiters to get clients. “We screen all our candidates with this exam!”

Exams, take home or no, are essentially free work which you should in no way shape or form feel like you should have to do in order to get a job. You’re a highly trained and skilled professional. You trained for months! Maybe years depending on your situation! Nights and weekends! You made sacrifices! You’re in data special operations! Why on Earth should you have to prove yourself AGAIN?

Nah.

If someone ask you to take a coding exam you say, “OH HEEEELLL NO!”.

Kidding. Don’t say that.

Politely decline and point them to your publicly available repo. Hopefully there is something in there that is directly relevant to the job you’re applying for. If there is, point them specifically to that.

If they push back, you let them know that you’re not jumping through hoops for a job. If they can’t respect that, then you probably don’t need to be working there anyway.

If the whole point is to prove you can do the job, then that can be done with your publicly available work and verbally in an interview with some coding trivia questions to ensure you really know your stuff. No silly code exam required!

And that, kiddies, is how you put the smack down Bob style!

Writing software is as much of an art as it is a science. Just like a visual artist, you should have a portfolio of work to prove how badass you are.

So, projects serve two purposes:

1. To push your skills to the next level.

2. To let the whole world know just how hard in the paint you go with your craft!

--

--

Living at the intersection between finance, economics, and data science/engineering. Follow me on Twitter! @BobLovesData