Pro Python Tips for Data Analysts
Complex analysis requires complex code. How do you keep this tidy, ready to evolve and improve?
The dream
The dream is to create sleek code, which clearly expresses the steps between the problem and the solution.
The reality
- A trail of rubbish code mixed in with the good stuff
- Too much to remember
- Too much code
- Too fragile
Towards the dream – top tips
Here are my top tips for improving your data analysis code.
- Clean your code, as you go along. Remove any dead ends. It will help with the next steps.
- Be consistent. Adopt a coding standard. Check out PEP8 and Black. Less to remember. Easier to read code.
- Use meaningful names. Don’t use ‘data’ or ‘df’. Easier to understand what is happening.
- Use chaining. Use the output from one step directly into the next step, on the same line. Check out Matt Harrison’s video on Idiomatic Pandas. When done correctly it makes your code more readable and lets you rerun individual cells.
- Re-use your own code – without copying. Copied code is difficult to maintain. Learn about Python’s loops and functions and use them.
- Use other people’s code. Search in the Python Package Index, GitHub, Stack Overflow or Kaggle. Start your next ML project with an open source model. There is so much amazing stuff out there – use it and save yourself some time.
- Test your code with automated tests. Use PyTest (with nbmake for notebooks). High test coverage lets you evolve your code with confidence.
- Know the difference between a script and a notebook. Scripts are great for complex code. Notebooks are great for experimenting and telling the story.
- Use version control. Change your code without fear. Your version control remembers previous versions and lets you roll back your changes.
- Keep learning. Copy code off the internet but understand how it works before moving on. Make learning part of your routine, read, listen, watch. Pick up ideas, tools and libraries for writing better code.
- Use the best tools, and know how to use them. Learn Jupyter’s keyboard shortcuts. If you write scripts use an IDE. They will make your data analysis more efficient and fun.
Take a bit of time to become a better programmer and to write better code. Learn to write sleek and robust code and sail off into a bright future.
This is a summary of an article I wrote for Aigents on a conference talk I did for PyCon Sweden 2021.
See the full article, including links to recommended tools, packages and articles, on the Aigents website.