Data Science as a field has been steadily growing in recent years. Merging statistics, computer science, mathematics, and domain knowledge, data Science history goes back to the 1960s when professionals started exploring the possibilities of using data for decision-making. Initially, the main focus was to use data analysis and statistical modeling to understand datasets. The field started to take off with the generation of digital data from diverse sources. This flow in data availability gave rise to the field of Data Science, accompanied by the concept of big data.
Data-driven companies like Facebook, LinkedIn, and Google are some of the main contributors to popularizing the field. Currently, many industries, including healthcare, technology, marketing, and finance, rely on employees well-versed in data science to help research and develop new solutions to problems for consumers and society at large. As civilization becomes more data-driven, data science will continue to be an important area of study for those wanting to have an impact on shaping the future.
Given the above, and as a computer science professor, I was very interested in the opportunity to review the recently launched Professional Certificate by IBM on Coursera focusing on data science. Would a program delivered by a leading IT company be a good complement to the classroom courses I normally teach on the subject?
Table of Contents
Overview of the IBM Data Science Professional Certificate
Given the above, and as a computer science professor, I was very interested in the opportunity to review the recently launched Professional Certificate by IBM on Coursera focusing on data science. Would a program delivered by a leading IT company be a good complement to the classroom courses I normally teach on the subject?
This professional certificate is designed to provide participants with the knowledge, resources, and portfolio needed to stand out as an entry-level data scientist in the job market. No prior computer science or programming experience is necessary. Databases, data visualization, statistical analysis, predictive modeling, machine learning algorithms, and data mining are just a few of the skills participants will try to master. Additionally, participants will get acquainted with Python, SQL, Jupyter notebooks, Github, Rstudio, Pandas, Numpy, Scikit-Learn, Matplotlib, and other languages and tools. From the perspective of an academic, it is also particularly interesting that this certification is ACE recommended, meaning that 12 college credits can be earned by completing the program (although it will be at the discretion of your university whether they recognize them or not).
All the contents of the course are online, and it is self-paced – participants can study the course according to their own schedule (however, note that as you are paying monthly, the cost goes up the longer it takes you to complete it). The standard completion time is around 5 months, but if you have a lot of time to dedicate or you are already familiar with the content of some of the courses, you can complete this faster.
How much does the Professional Certificate cost?
The IBM Data Science Professional Certificate is competitively priced, with a monthly subscription of $49 and a self-paced completion time of about five months. Overall, the program offers a valuable learning experience and is a worthwhile investment. If completing it at the recommended pace, the total cost is, in other words, $245.
Note, however, that this Professional Certificate is not included in Coursera Plus subscriptions but needs to be purchased separately. On the other hand, if you are on a budget, as it is self-paced, you can aim to complete the program in less than the recommended five months to save money. As for most Coursera courses, it is also possible to audit the individual courses of the Professional Certificate for free.
My overall impression
With its division into ten component courses, it allows participants who have no knowledge of Data Science to start from scratch, while those with pre-existing knowledge can quickly go through the introductory courses. However, a disadvantage of the course is that it is still primarily suitable for beginners. For participants at an intermediate level, who want to go to an advanced level, the first four courses won’t be useful for them.
The component courses are delivered using a mix of videos, reading material, frequent quizzes and one or more peer-graded assignment per course. My overall impression is that the course features a good balance of the different types of material and assessment methods – even if like for many Coursera courses, there can be some disconnect between the different materials at times (presumably as they have been updated piecemeal rather than comprehensively).
Detailed review: The component courses
Course 1: What is Data Science
The initial 7-hour course within the certification program is titled “What is Data Science, ” serving as an entry point for beginners interested in Data Science. The course provides brief details on an overview of Data Science, key topics in Data Science, how to become a data scientist, big data, Hadoop, data mining, deep learning, and neural networks. Moreover, the course deals with the use of Data Science in various business domains. The course helps beginners to get a solid foundation in Data Science, what are the key ingredients of Data Science, and how Data Science can be effectively applied in business contexts.
The instructors of the course are Rav Ahuja (Global Program Director, IBM Skills Network) and Alex Aklson, Ph.D. (Data Scientist).
The course is an excellent choice for those looking to gain a fundamental understanding of data science and its significance in various industries. With its short duration, lasting about seven hours, it efficiently covers the basic concepts of data science. The course curriculum includes comprehensive reading materials, ensuring that learners have access to essential resources to enhance their knowledge. Additionally, the quizzes and peer-graded assignments provide a valuable opportunity to reinforce their learning.
One of the disadvantages of the course is the very basic level, which can be frustrating for intermediate learners. Furthermore, some participants may feel that the topic of “Big Data” deserves its own separate course, as it represents a crucial pillar within the broader domain of Data Science. A dedicated course on this subject could offer more in-depth exploration and analysis, catering to learners seeking a deeper understanding of Big Data’s intricacies.
Course 2: Tools for Data Science
The second course within the certification program is titled “Tools for Data Science.” This is a longer course (approximately 17 hours) comprising seven main sections. The course provides brief details on commercial and open source Data Science tools, different languages in Data Science like Python and R, Libraries, APIs, Models and Datasets, Jupyter Notebooks, Github, RStudio IDE, and an assignment solving a dozen practical problems using Jupyter Notebook. The course also contains an optional submodule on IBM Watson Studio. To build abilities for using these Data Science Tools, this course provides the opportunity for a large amount of practical practice. Participants can test all the tools and follow instructions to execute basic Python, R, or Scala code using the tools available in the cloud on Skills Network Labs.
Although it provides a good overview, the libraries, APIs, models, and datasets would have to be explained in greater detail to be practically useful in a professional setting, which may be a drawback of this course.
The instructors of the course are Aije Egwaikhide (Senior Data Scientist, IBM), Svetlana Levitan (Senior Developer Advocate, IBM Center for Open Data and AI Technologies), and Romeo Kienzler (Chief Data Scientist, Course Lead, IBM Watson IoT).
Course 3: Data Science Methodology
The next course is titled “Data Science Methodology,” looking at how to approach data science problems, including converting the problem into requirements, data collection, understanding the data, modeling and evaluation, deployment, and feedback. The course helps beginners get a solid foundation in problem identification, data collection, the six stages of the CRISP-DM methodology, modeling, evaluation, and feedback.
The instructors of the course are Alex Aklson, Ph.D. (Data Scientist), and Polong Lin (Data Scientist).
The course’s benefits include providing participants with practical modeling, assessment, and deployment knowledge. Participants will learn to apply the CRISP-DM methodology, which is the most widely used approach for solving data science and data mining problems. The course’s main drawback is that more information on modeling and evaluation was needed; many real-world examples would have helped the student comprehend the content.
Course 4: Python for Data Science, AI & Development
The fourth course delves into using Python in Data Science and Artificial Intelligence. A relatively long course, clocking in at approximately 25 hours, it looks at Python basics, data structures in Python, programming fundamental concepts in Python, how to work with data in Python, and creating dashboards. The course contains multiple hands-on labs to learn and work practically on Python.
The instructor of the course is Joseph Santarcangelo, Ph.D. (Data Scientist at IBM).
A benefit of the course is that participants are introduced to practical Python applications for data science, including the creation of dashboards. The course’s drawback is that there should have been more reading material provided. Note that if you are not following the full program
Course 5: Python Project for Data Science
The fifth course within the certification program is the first intermediate course and is meant to be a continuation of course 4 (Python for Data Science, AI, and Development) so that participants can exercise the fundamental Python skills they learned in that course. Participants are provided with a real-world dataset and scenario and the focus of the course is to apply all the previous knowledge acquired to a practical project. The course also contains an optional module on Web Scrapping.
The instructors of the course are Azim Hirjani (Cognitive Data Scientist) and Joseph Santarcangelo (Ph.D., Data Scientist, IBM).
This brief course has the benefit of being hands-on – participants’ prior Python expertise will be examined in the hands-on project. The course is, however, unnecessarily heavy on quizzes; it could have been preferable to incorporate a second practical project in place of the quizzes.
Course 6: Databases and SQL for Data Science with Python
The sixth course introduces students to databases and SQL, which are critical skills for a data scientist. The course provides brief details on writing basic SQL statements like INSERT, UPDATE, SELECT and DELETE, filter results using DISTINCT, COUNT, WHERE and LIMIT, DML, DDL, ALTER, DROP, GROUP, ORDER, building sub-queries, accessing SQL database using Python in Jupyter Notebook, stored procedures, views and joins. The course contains multiple hands-on labs for the practical demonstration of SQL queries. Already a long course at over 30 hours, the course also contains an optional Advanced SQL module (upon completion of this advanced module, participants will be conferred with an Honors level certificate).
The instructors of the course are Rav Ahuja (Global Program Director, IBM Skills Network) and Hima Vasudevan (Data Scientist, IBM).
The course’s benefits include teaching students how to use SQL practically through hands-on labs and giving them advanced SQL knowledge of stored procedures and transactions. The drawback is that the evaluation criteria ought to have been project-oriented rather than relying on tests.
Course 7: Data Analysis with Python
The seventh course looks at data analysis using Python, covering briefly how to import and collect data, prepare and format it, manipulate data frames, provide summaries of the data, develop machine learning models, iterate on the models, and design data pipelines. The course contains a practical peer-graded assignment, looking at house sale data, which assesses the learning acquired in the module.
The instructor of the course is Joseph Santarcangelo (Ph.D., Data Scientist, IBM).
The course has the benefit of having clearly defined headers and appropriate learning resources for each topic. The main drawback is an over-reliance on quizzes – practical projects rather than quizzes should be the primary method of evaluation in these practical courses.
Course 8: Data Visualization with Python
The eighth course is the second intermediate course in the certification and provides brief details on Data Visualization tools, graphs, word clouds, scatter plots, pie charts, bar charts, histograms, area plots, waffle charts, visualization libraries, Folium, Dash, Seaborn, Matplotlib & Plotly. The final assignment is divided into 3 parts to test the knowledge of all the taught libraries, the assignment titles can be viewed in Figure 10.
The instructor of the course is Saishruthi Swaminathan (Data Scientist and Developer Advocate, IBM CODAIT).
Course 9: Getting Started with Data Warehousing and BI Analytics
The second last course within the certification program (and the third intermediate course) looks at Machine Learning, including regression, classification techniques, classification algorithms, clustering, SciPy, scikit-learn, and comparison of different Machine Learning models.
The instructors of the course are Saeed Aghabozorgi, Ph.D. (Sr. Data Scientist) and Joseph Santarcangelo, Ph.D. (Data Scientist, IBM).
A drawback of this course is that, due to its intermediate level, more reading materials ought to have been offered, as it is otherwise easy to get stuck for someone coming at this fresh. To provide additional information about the issues, the course may have been split into smaller courses, such as Supervised Learning and Unsupervised Learning. Nevertheless, the course does have merit in teaching a wide range of Machine Learning concepts in a hands-on, experiential manner.
Course 10: Applied Data Science Capstone
The last course within the certification program is the “Applied Data Science Capstone”. As should be the case for a capstone project, the course emphasizes prior knowledge rather than offering any new information. Participants in the course will work as data scientists and apply the principles learned in earlier courses to solve challenges in the real world.
The instructors of the course are Yan Luo, Ph.D. (Data Scientist and Developer, IBM) and Joseph Santarcangelo, Ph.D. (Data Scientist, IBM).
What do others say?
The majority of student reviews for the IBM Data Science Professional Certificate are favorable. The course is praised by students for its practical projects, hands-on approach, and thorough treatment of data science and machine learning topics. However, some participants have voiced concerns in particular about not covering sufficiently advanced topics in data science and machine learning for it to provide a sufficiently strong base to apply for jobs in data science.
My recommendation
Having completed the Coursera Professional Certificate course “IBM Data Science,” my overall impression is that it’s a comprehensive program for providing beginner data scientists with their first set of important skills and knowledge. Participants achieve expertise in fundamental data science tools, techniques, and approaches through its well-structured syllabus and practical approach. As a student in this certification, I found that the curriculum had been carefully planned out, offering a well-balanced mix of academic ideas and practical tasks. The course’s user-friendliness and flexibility make it perfect for learners at a beginner level (and, to a certain degree, also those at an intermediate level). In addition, the inclusion of real-world projects and industry-relevant case studies helps ensure that learners are job-ready. However, I would expect that most students would need to complement this certificate with more in-depth studies and/or practical projects that can provide them with a bit more specialized expertise for them to become employable as data scientists.
40% ($140 USD) off your first year of Coursera Plus Annual (expires 2 December 2024)