How Do Super Rookies Start Learning Data Analysis?
For super rookies, the first task is to understand what data analysis is.
Data analysis is a type of knowledge discovery that gains insights from data and drives business decisions.
There are two points here. One is how to gain insights from the data. Data is cold and can’t speak. Professional data analysts must have a wealth of business knowledge in order to know from the data what has happened and what is about to happen. In addition, tools for data analysis and data mining are also important. Excel, Python, Power BI, Tableau, FineReport are frequently used by data analysts. However, many beginners often pay too much attention to tools and ignore the professional qualities that a data analyst should have.
Another is how to drive business decisions. This may not be the level that ordinary data analysts can decide. But a good data analyst does need to have a keen business vision. Pure data analysis results are not helpful. Combining the analysis results with real scenes to produce instructive conclusions is the value of a data analyst.
I know novices are very concerned about the learning process of data analysis. You may be full of doubts and yearnings for SQL, Python, R, etc. This was also my mentality when I first came into contact with data analysis. There are so many things to learn, which one should I learn? How do I learn, and to what extent?
Now let me talk about the selection of data analysis tools.
How to Choose Data Analysis Tools?
In general, if you want to become an excellent data analyst, you should master at least three types of tools: self-service BI tools, SQL, and programming languages. The selection criteria of these three types of tools are different.
For super rookies, the priority is to learn self-service tools, to ensure that they can get started with data analysis as soon as possible, and to master the basic knowledge of data analysis. Second, learn SQL and understand the concept of database. Finally, if you want to reach a higher level, you need to learn programming languages and even data analysis libraries. Next, I will introduce the specific selection one by one.
1. Self-service BI tools
What is a self-service analysis tool? In fact, it’s a BI analysis tool specifically for business people, helping them get rid of the constraints of traditional IT and complete data analysis work independently. For the super rookie, the learning cost and threshold are relatively low, and it is easy to get started.
Taking FineReport as an example, it is a BI reporting tool that can connect to various data sources, quickly analyze the data, and make various reports and cool dashboards. Its designer interface is similar to Excel. You can complete real-time report through simple drag and drop operations. Its data entry system and support of decision-making platform provide a series of functions of data reporting, process approval, and authority management, which can flexibly respond to business needs such as operations, human resources, finance, and contracts.
In fact, FineReport is like a combined version of Excel and Tableau. It can produce a variety of complex reports. At the same time, it also advocates visual exploratory analysis. It is a bit like an enhanced version of PivotTable. The visualization component library of FineReport is very rich. It can be used as a portal for data reporting, or as a platform for business analysis.
For newbies, a tool with low learning difficulty but powerful analytical performance cannot be better. And more importantly, the personal version of FineReport is completely free, which can support individuals to conduct self-service analysis.
Of course, other BI tools such as Power BI and Qlikview also have their own advantages. If you want to learn more about self-service BI tools, you can take a look at this review: 5 Most Popular Business Intelligence (BI) Tools in 2019, to understand your own needs and then choose the tool that is right for you.
Structured Query Language (SQL) is used to communicate with a database. It is a database query language for accessing data and querying, updating, and managing relational database systems. Common relational database management systems are SQL Server, MySQL, Oracle, MS Access, DB2, etc. Most database systems use SQL. Generally, companies will store data in local databases or public clouds. Some will use MySQL, Oracle, MongoDB, etc., and others will use big data storage format like HBase and Parquet.
I will recommend beginners to learn SQL well, and then get to know about HBase and Parquet as needed.
3. Programming Languages
Python and R are the two most widely used programming languages in the field of data analysis. I think both are suitable as the core language of data analysis, but it is better to choose one to learn.
Since many people have asked me questions about Python, and I also work with Python myself, here I will talk about the advantages and disadvantages of using Python for data analysis.
As a high-level programming language, the biggest disadvantage of Python is that it is not good at developing underlying applications. But except for that, Python can do almost anything. When it comes to data analysis, from database operations, data cleaning, data visualization, to machine learning, batch processing, script writing, model optimization, and deep learning, all these functions can be implemented with Python, and different libraries are provided for you to choose.
In addition, Jupyter Notebook is also an excellent interactive tool for data analysis and provides a convenient experimental platform for beginners.
4. Data Analysis Libraries
In addition to the three types of tools mentioned above, there is actually a type of data analysis library that is more suitable for advanced data analysts. If you are still a newbie, you can ignore this section.
Pandas is a Python data science library that is constantly improving. Its data structure is very suitable for data processing. Pandas incorporates a large number of analysis function methods, as well as common statistical models and visualization processing. If you use Python for data analysis, during the data preprocessing process, almost 90% of the work needs to be completed using Pandas.
NumPy is a numerical calculation library for Python. Many analysis libraries, including Pandas, are built on NumPy.
The core features of NumPy include:
- Ndarray, a fast and space-saving multidimensional array with vector arithmetic operation capabilities.
- Standard mathematical functions for fast operations on entire sets of data (no need to write loops).
- Tool for reading and writing disk data and for manipulating memory-mapped files.
- Linear algebra, random number generation, and Fourier transform functions.
- A C API for integrating code written in languages such as C, C ++, Fortran.
NumPy is especially important for numerical calculations because it can efficiently process large arrays of data. This is because:
- NumPy arrays use less memory than Python’s built-in sequences.
- NumPy can perform complex calculations on entire arrays without the need the For loop of Python.
Alright, that ’s it for today ’s introduction of data analysis tools. If you want a more comprehensive getting started guide for data analysis, you can refer to the following articles: