What Does a Data Modeler Do?
Data modeling refers to the abstract organization of various types of data in the real world, determining the scope of the database, the organization of the data, etc. until it is transformed into a real database. After converting the conceptual model abstracted after the system analysis into a physical model, the process of establishing database entities and the relationships between entities in visio or erwin tools (entities are generally tables).
- Data modeling is a process for defining and analyzing data requirements and the corresponding supporting information systems that they need. Therefore, in the process of data modeling, the professional data modeling work involved is closely related to the interests of the enterprise and the user's information system.
- From requirements to actual databases, there are three different types. The data model for information systems, as a conceptual data model, is essentially a set of initial specification techniques for recording data requirements. The data is first used to discuss the initial requirements that fit the business, and then transformed into a logical data model that can be implemented in a conceptual model of the data structure in the database. Implementation of a conceptual data model may require multiple logical data models. The final step in data modeling is to determine the specific requirements for data access performance and storage from the logical data model to the physical data model. Data modeling not only defines data elements, but also their structure and their relationships [1]
- 1. Use a computer to describe the behavior of a system. For example, spreadsheet programs can be used to process financial data, represent the company's behavior; develop business plans; and assess the impact of changes in the company's operations.
- 2.Use a computer to
- The main activities in the modeling process include:
- Identify the data and its related processes (the field salesperson needs to review the online product catalog and submit a new customer order).
- Define data such as data type, size, and
- Selection and reconstruction variables
- Before modeling, the first thing to consider is what variables are used to build the model, which needs to be considered from two aspects of business logic and data logic:
- Business logic: The variables are based on the collected data, and when the data is collected, it will generate business-related logic.
- Data logic: Usually considered from the perspective of data integrity, concentration, and whether it is strongly related to other variables (or even causal relationships), such as a variable that is valuable in business, but the missing rate reaches 90%, or one Non-boolean variables are concentrated on two values, so at this time we must consider whether adding this variable is valuable for subsequent analysis.
- When selecting variables, business logic should take precedence over data logic. Gein business logic is naturally generated from the actual situation, and the results of modeling are also fed back to the actual situation. Therefore, when selecting variables, the business logic is more important. .
- When the variable itself is not suitable for modeling directly, for example, the satisfaction in the questionnaire is "unsatisfied", "general", and "satisfactory" of the Chinese character, then it needs to be reconstituted into "1" (corresponding to dissatisfaction) 2 "(corresponding to the general) and" 3 "(corresponding to the satisfaction) are in digital form, which is convenient for subsequent modeling.
- In addition to this reconstruction method, separate calculations (such as averaging) and combined calculations (such as A * B) of variables are also common reconstruction methods. There are many other refactoring methods.
- Selection algorithm
- When we are modeling, the goal is to solve business problems, rather than modeling for modeling, so we need to choose a suitable algorithm. Commonly used modeling algorithms include correlation, clustering, classification (decision tree), time series, regression, neural networks, etc.
- Taking the modeling of consumers as an example, here are some common algorithms in some scenarios:
- Dividing consumer groups: clustering, classification;
- Shopping basket analysis: correlation, clustering;
- Purchase forecast: regression, time series;
- Satisfaction survey: regression, clustering, classification;
- and many more.
- After you determine the algorithm, you need to see if the variables meet the algorithm requirements. If not, go back to selecting / reconstructing variables and do it again. If yes, go to the next step.
- Setting parameters
- After the algorithm is selected, it needs to be modeled with a data analysis tool. For different models, you need to adjust the parameters, such as the K-means algorithm in the clustering model, you need to give the number of categories that you want to cluster, and further you need to give the starting cluster center and the maximum number of iterations.
- These parameters will be adjusted many times in subsequent tests, and rarely one test is successful.
- Loading algorithm and test results
- After the algorithm is run, you need to determine whether the algorithm can solve the problem according to the output of the algorithm. For example, the result of K-means is not good, then consider using a system clustering algorithm to solve it. Or the result of the regression model does not meet the needs, consider using time series to do it.
- If you do not need to change the algorithm, then test whether the output of the algorithm has room for improvement. For example, the clustering algorithm specifies that the clustering result contains 4 types of people, but it is found that the two types of characteristics are close, or that the type of population is not obvious Characteristics, then you can adjust the parameters and try again.
- In the process of continuously adjusting parameters and optimizing the model, the interpretation ability and practicability of the model will continue to improve. When you think that the model can meet the target needs, then you can output the results. A report, some rules, a piece of code, can all become the output of a model. After the output, there is one final step: receiving feedback from business people to see if the model solves their problem [4] .
- The above is the general process of modeling.