1. Python, data science, & software engineering
Welcome to the course "Software Engineering for Data Scientists in Python".
My name's Adam Spannbauer. I’m a data scientist from Tennessee who likes to code in both R & Python.
In this course, we'll be covering software engineering concepts that can revolutionize your Data Science workflow.
2. Why software engineering?
So why should a Data Scientist care about the principles of Software Engineering?
3. Why software engineering?
Many Data Scientists, like myself, start in the world of math & statistics.
4. Why software engineering?
And soon after, you learn that communicating these technical ideas is just as important as understanding them yourself.
5. Why software engineering?
However, many Data Scientists are self-taught programmers who see coding as a means to an end; it's just a step to create a model or run a simulation.
6. Why software engineering?
But in reality, Software Engineering is a valuable skill that needs to be learned and practiced. This course will help build your software engineering skills by introducing some very important concepts.
7. Software engineering concepts
Three topics in particular that we'll cover are modularity, documentation, and testing.
Another topic worth mentioning is Version Control. We won't be covering it in this course, but I recommend checking out the dedicated DataCamp course to learn more about this powerful tool.
Let's define each of these concepts that we'll be covering.
8. Benefits of modularity
First was modularity.
To introduce modular code lets start by defining what it's not. Non-modular code can take the form of long, complicated, hard to read scripts and functions.
Programming becomes less complex when your code is divided into shorter functional units, and this is the whole idea behind modularity.
With modular code, not only does code become more readable but it becomes easier to fix when something breaks. Additionally, modular code is easier to take along with you to your next project; which allows you to save time by avoiding re-solving problems you've already solved in a previous project.
9. Modularity in python
We can write modular code in python by leveraging packages, classes, and methods.
In this example code, we import the pandas package. We create a new object using the powerful DataFrame class from pandas. Finally, we use the convenient plot method that comes built into the DataFrame class.
Later in this course, you'll write a fully functional python package using all these concepts to perform text analysis.
10. Benefits of documentation
If you work on a team or if you ever publish a project, then other people will need to read your code, and sometimes, the other person is just future you. Both future you and other people will have an easier time reading your project if you use good documentation practices.
In this course, we'll cover how to use comments, docstrings, and self-documenting code to document your Data Science python projects. A project using all these techniques can save a lot of confusion and frustration for everyone who touches the code.
11. Benefits of automated testing
The last software engineering concept we'll touch on is testing. One thing everyone learns is that people make mistakes. That's why pencils have erasers, why computers have spellcheck, and why software needs tests.
Often times data scientists will write some code, test it in the console once, and then never test it again. It's definitely worthwhile to perform these manual tests, but leveraging tools like the pytest package can automatically run and re-run your tests to ensure your code is working as intended even after adding new functionality.
12. Let's Review
Ok, we went over some pretty big topics. Let's do some exercises to review.