power bi

Data cleaning or data transformation is an important steps towards data visualization. Power Query is like a superhero for your data! It has lots of cool tools to help you make your data neat and ready for analysis. You can make a messy model look simple, change the way data is shown, give things new names, and rearrange data. Plus, you can check out columns to see which ones have the important info you need for a closer look at your data. It’s like having a toolkit to get your data in tip-top shape!

Overview

Let’s say you brought in data to Power BI from different places, but when you look at it, it’s not ready for studying. What might be wrong with the data that makes it not ready for analysis?

When examining the data, you discover several issues, including:

  • A column only contains numerals.
  • Several columns contain errors.
  • Some columns contain null values.
  • The customer ID in some columns appears as if it was duplicated repeatedly.
  • A single address column has combined street address, city, state, and zip code.

You start working with the data, but every time you create visuals on reports, you get bad data, incorrect results, and simple reports about sales totals are wrong. Fortunately, Power BI and Power Query offer you a powerful environment to clean and prepare the data. Please refer my video as well for better understanding Link.

My Data Cleaning Video on YouTube.

Steps for Data Cleaning in Power BI

  1. Get data into Power Query Editor
  2. Shape the initial data:
    • Understand Your Data.
    • Identify column headers and names (Promote headers).
    • Rename columns
    • Remove unnecessary Rows and Columns.
    • Unpivot columns & pivot columns.
  3. Simplify the Data Structure:
    • Split the columns.
    • Replacing Blank Values.
    • Dealing with Null Values
      – Delete the entire row.
      – Fill Up/Down option.
      – Replace with default value or a calculated value.
      (Mean, Median, Mode Imputation)
    • Remove duplicate values.
    • Format Data (Upper case, trim,…)
  4. Combine multiple tables into a single table:
    • Merging Queries
    • Appending Queries
    • naming tables, columns, and values Also add prefix & suffix.
  5. Profile data in Power BI:
    • Examine data structures
      – Semantic model
      – Discourage implicit measures
      – Manage Relationships
    • Find data anomalies and data statistics.
      – Column distribution
      – Column quality
      – Column profiling
  6. Load data in Power BI Desktop:
  7. Use Advanced Editor to modify M code

Get Data Into Power Query Editor

By clicking on Transform Data option we can take our data to power query editor for data cleaning process. Before that you need to import data from data source. In my case I took excel workbook file. This is the first step in ETL process (to extract the data from data source).

Shape the Initial Data

In Power BI Desktop, the Power Query Editor is like a wizard that helps you change and organize your data. You can do things like giving names to columns or tables, turning words into numbers, getting rid of rows you don’t need, saying which row has the titles, and more. Shaping your data with this editor is crucial to make sure it’s just right for your reports.

Understand your data:

First of all as soon as we import our data from data source to power bi, we must spend some time to understand our data and it’s tables. Understanding data will give you a clear idea about how you want to clean the data, so that no unnecessary items will be remove by mistake.

Identify column headers and names:

The first step in shaping your initial data is to identify the column headers and names within the data and then evaluate where they are located to ensure that they are in the right place. Sometimes a problem can occur with the data in its current state because column headers are in different rows (marked in red), and several columns have undescriptive names, such as Column1, Column2, and so on.

When a table is created in Power BI Desktop, Power Query Editor assumes that all data belongs in table rows. However, a data source might have a first row that contains column names, so promote headers here.

Rename columns:

After you’ve started organizing your data, the next thing to check is the names at the top of each column. You might find that some columns have the wrong names, there are spelling mistakes, or the names don’t follow the same rules. You can fix this by changing the column names in two ways. One way is to right-click on the name, choose “Rename,” type the correct name, and press Enter. Another way is to double-click on the column name and write the right name over it. It’s like giving each column the right label!

Remove unnecessary Rows and Columns:

When shaping your data, you might need to remove some of the top rows, for example, if they are blank or if they contain data that you do not need in your reports.

To remove these excess rows, select Remove Rows > Remove Top Rows on the Home tab.

A key step in the data shaping process is to remove unnecessary columns. It is much better to remove columns as early as possible. One way to remove columns would be to limit the column when you get data from data source. For instance, if you are extracting data from a relational database by using SQL, you would want to limit the column that you extract by using a column list in the SELECT statement.

You can remove columns in two ways. The first method is to select the columns that you want to remove and then, on the Home tab, select Remove Columns.

Unpivot columns & pivot columns:

Unpivoting is a useful feature of Power BI. You can use this feature with data from any data source, but you would most often use it when importing data from Excel. Unpivoting is changing distinct categories of columns into rows.

You can use the Pivot Column feature to convert your flat data into a table that contains an aggregate value for each unique value in a column. For example, you might want to use this feature to summarize data by using different math functions such as Count, Minimum, Maximum, Median, Average, or Sum.

Simplify the Data Structure

When you bring in data from different places into Power BI Desktop, each set of information has its own table and column names. But sometimes, these names might not be easy to understand or might not follow the same rules. To make things simpler and more organized, you can use the Power Query Editor in Power BI Desktop. This helps you change names and tidy up your data so it’s easier for you or others to work with. It’s like giving each part of your data a clear and meaningful label!

Split the columns:

In Power Query we can split the given columns based on delimiters. There is a whole list of other options as well to split the columns.

Replacing blank values:

You can use the Replace Values feature in Power Query Editor to replace any value with another value in a selected column.

Dealing with null values:

Sometimes, your data might have empty spots, like a blank space where there should be a number. For instance, the cost of shipping on a sales order might be empty if it’s the same as zero. If you leave these empty spots as they are, it can mess up calculations like averages. One way to fix this is by changing those empty spots to zero. This way, when you calculate things like averages, it will give you a more accurate result.

Remove duplicate values:

You can also remove duplicates from columns to only keep unique names in a selected column by using the Remove Duplicates feature in Power Query.

Format data option:

There is a Format Data option for each column. We can trim our column, clean for any invalid punctuations, find length, add prefix, suffix.. etc.

Combine Multiple Tables into a Single Table

Being able to mix different sets of information is really helpful. It’s like putting together puzzle pieces! You might want to do this when:

  • There are lots of tables, and it’s getting confusing. You can join them to make things simpler.
  • Some tables do similar jobs, so it makes sense to stick them together.
  • A table has only a few bits of info that can fit better in another table.
  • You want to use parts from different tables to create something new, like a custom column. It’s like building something new with your favorite Lego pieces!

You can combine the tables in two different ways: merging and appending.

When you append queries, you’ll be adding rows of data to another table or query. For example, you could have two tables, one with 300 rows and another with 100 rows, and when you append queries, you’ll end up with 400 rows. When you merge queries, you’ll be adding columns from one table (or query) into another. To merge two tables, you must have a column that is the key between the two tables.

On the Home tab on the Power Query Editor ribbon, select the drop-down list for Append Queries. You can select Append Queries as New, which means that the output of appending will result in a new query or table, or you can select Append Queries, which will add the rows from an existing table into another.

When you merge queries, you’re combining the data from multiple tables into one based on a column that is common between the tables.

Profile Data in Power BI

  • Examine data structures
    – Semantic model
    – Discourage implicit measures
    – Manage Relationships
  • Find data anomalies and data statistics.
    – Column distribution
    – Column quality
    – Column profiling
power bi

Profiling data means taking a close look at the details of the data. It’s like being a detective, finding anything unusual, checking how the data is set up, and looking at the stats, such as how many rows there are, how values are spread, the smallest and largest values, averages, and more. This is really useful because it helps you organize the data in a way that makes it easy to work with. It’s like getting the data ready for a smooth and simple ride when you’re creating reports and doing other tasks!

Load Data in Power BI Desktop

After doing all these steps we can apply changes and load our data into data model and start building visuals and reports.

Please refer my video as well for better understanding Link.

By Akshay Tekam

software developer, Data science enthusiast, content creator.

Leave a Reply

Your email address will not be published. Required fields are marked *