Posts

Data Engineering Tools

Image
   Data is a new oil. To channelize this data we need proper pipelines. These pipelines are mostly called ETL or data pipelines.  There are various tools out there to work out these Extract, Transform, and Load (ETL) operations.  Some of the scripting tools are:  - Python - SQL and NoSQL Apache provides a wide range of products that can be used as Data Engineering tools.  Some of the amazing apache DE services comprised of; - Hadoop - Spark - Kafka - Cassandra - Hive In addition, some cool DE tools are: - Tableau - Talend - MapReduce These DE tools provide a way to manage the data pipelines in a more effective, efficient, and better way.  Almost every Data-Oriented company in the corporate sector is highly dependent upon these technologies. They are leveraging many of the above-mentioned mechanisms for promising Data Services and Architecture. That's it for today. Hope you have enjoyed this article. If you want to know more about data or software related concepts, let me know in th

Applications of Data Engineering

Image
In a previous blog post about data engineering, we went through several many concepts related to data engineering such as data modeling, and then we went through different types of databases, for example, relational databases and we also got a glimpse of relational as well as NoSQL databases. And in today's blog post we will be writing about data engineer applications. Which are basically the use cases of data engineering 1st let's see what data engineering applications mean; so basically when we have data and we have engineered it then there are several things that we can do with that data and these things that we can do with processed data are the use cases of the data engineering. For example: If we have filled our containers with the data then it's not of any good use if it is not clean and well-formed. If your data is well-formed and if there are no logical mistakes, missing values, and garbage then we can easily perform the use cases and applications of data enginee

NoSQL database

Image
  Data is everywhere, it is expanding with each passing day. Relational Database systems are kind of fixed and static. When we have something in abundance and without any standard formation, then the tables of relational database systems cry out.  So what to do with so many data integrations from several structured as well as unstructured sources. Well, the answer is to go for NoSQL (Not Only SQL). SQL is a structured query language that is used to query Relational Databases. However, NoSQL database does not contain data in the form of relations. Most of the NoSQL databases follow the JSON type structure to store data.    Formal Definition:      Guru99  provides a pretty nice definition for NoSQL databases: " NoSQL Database  is a non-relational Data Management System, that does not require a fixed schema. It avoids joins and is easy to scale. The major purpose of using a NoSQL database is for distributed data stores with humongous data storage needs. NoSQL is used for Big data an

Relational Data Model

Image
In the previous blog post, we covered Data Modeling. We have seen that data modeling is really important before jumping into predictive modeling.  In this article, we will cover the Relational Data Model (RDM). Let's start from the very basics, relational means anything which is linked with some other thing. This link is called "Relationship". In RDM or RM there are major three types of relationship cardinalities (also called Multiplicities or Cardianalities): 1. One to one 2. One to many 3. Many to many We can understand these with the help of a few simple examples;  One to one: One department has one manager. One to many: One class has many students. Many to many: Many teachers can teach many students. Okay, so we have got an abstract overview of the term " Relational ", now the next thing is " Data ", data covers everything, anything... And finally, " Model " means something in the form of a template, design, a kind of blueprint.  To store

Data Modeling

Image
source: https://searchdatamanagement.techtarget.com/definition/data-modeling A wise person once said a picture is worth a thousand words . In real-world data is not just a few rows of excel sheet, yet it comprises several billion rows expanded through hundreds  of tables. A table is usually made up of rows and columns. Each table represents an Entity . Entity means an Object or thing. That Entity has several properties called Attributes . Those attributes represent the qualities or characteristics of the entity. Since there is more than one entity, so there should be a Relationship between those entities. If we look at the tables we cannot get a clear idea of the relationship between several tables/Entities. Hence, we need to represent the tables/data in the form of a picture or a flow graph. That flow graph or Entity-Relationship diagram is called a Data Model .  Data Modeling is a technique to represent how the tables are related to each other and to represent the structure of da