Data sets come in all shapes and sizes: big or small, tall or wide, ragged, neat or "awkward", clean or dirty, sparse or dense. In this class you'll learn how to deal with them all to get the answers you're after in a workflow well suited for scientific, engineering, and financial analysis.
Overview
In the first half of the class we talk about a structured approach to data analysis, walking through the steps of finding and loading data, giving data that important first look and then preparing it for analysis, then finally analyzing and modeling the data. We'll discuss each step and then introduce an assortment of tools that are available to carry them out. We'll talk about "tidy" and "wide" data formats, what they're good for, and how to get your data into them and how to convert your data between them. We'll talk about a variety of storage formats and when you'd want to use them. We'll discuss different visualization tools and get some practice with them. And finally we'll talk about setting up data analysis workflows and fostering a healthy data culture at your organization.
In the second half of the class we pivot to a "Pandas Practicum" where we focus on implementing each step of the data analysis workflow in
Pandas. Pandas is a popular Python library for dealing with labeled, indexed data. By trading some of the n-dimensional generality of
NumPy's ndarray, for the more-restricted but broadly useful indexed, column-oriented
DataFrame, Pandas provides tools that cover the entire data analysis workflow. We'll walk through each of the steps from the first half of the class with live demos and both short and long exercises to give you practical experience so you can master the fundamentals of Pandas. We'll make sure you're comfortable with the sometimes-confusing notions of positional- versus label-based indexing and leveraging the DataFrame's index for pivoting, merging, joining, stacking and unstacking.
No Such Thing as a Dumb Question
At Diller Digital, we love student questions! As we like to say — if you have a question, chances are good someone else in the class does too, and they'll thank you for speaking up. We recognize that questions and problems provide good context for learning, so we encourage you to bring your specific use cases to class with you and take advantage of the opportunity to interact with your live instructor. That's the Diller Digital Difference you won't get in books, blogs, or videos!
Results
By the end of Data Analysis with Pandas for Scientists & Engineers, you will have experience working with data through each stage of the data analysis workflow. You will have a collection of scripts, notebooks, and functions that you wrote during class as well as a rich collection of demos and examples that come with the class materials. You will have a structured way to approach data analysis problems and a full toolbox to solve them.
Data Analysis with Pandas for Scientists & Engineers can set you up for success with Machine Learning for Scientists & Engineers or Deep Learning for Scientists & Engineers. Check out the course catalog for next available dates or email info@dillerdigital.com if you don't see a date that suites you.
Testimonials
"Materials were well structured and instructors were prepared and kept a good space."
- Analyst at a large US financial institution
"There are some materials you do not easily search online"
- Different Analyst at the same large US financial institution
"I think the difference between wide and tidy data will help me a lot with working with my own data. I also think I will get a lot out of knowing how to rearrange and relabel DataFrame information."
-Engineer at Sandia National Laboratories