38. Expenses and income example with Pandas and Python
By Bernd Klein. Last modified: 26 Apr 2023.
In this chapter of our Pandas tutorial we will deal with simple Expense and Income tables for private usage. Some say that if you want to manage your money successfully you will have to track your income and expenses. Monitoring the flow of money is important to understand your financial situation. You need to know exactly how much is coming in and going out. This article is not meant to convince you of the necessity of doing this. The main focus is rather on the possibilities offered by Python and Pandas to program the necessary tools.
We will start with a small example, suitable for private purposes and the following chapter of our tutorial continues with a more extensive example suitable for small businesses.
Private Budgeting with Python and Pandas
Let us assume that you have already a csv file containing your expenses and income over a certain period of time. This journal may look like this:
Date;Description;Category;Out;In
2020-06-02;Salary Frank;Income;0;4896.44
2020-06-03;supermarket;food and beverages;132.40;0
2020-06-04;Salary Laura;Income;0;4910.14
2929-06-04;GreenEnergy Corp., (electricity);utility;87.34;0
2020-06-09;water and sewage;utility;60.56;0
2020-06-10;Fitness studio, Jane;health and sports;19.00;0
2020-06-11;payment to bank;monthly redemption payment;1287.43;0
2020-06-12;LeGourmet Restaurant;restaurants and hotels;145.00;0
2020-06-13;supermarket;food and beverages;197.42;0
2020-06-13;Pizzeria da Pulcinella;restaurants and hotels;60.00;0
2020-06-26;supermarket;food and beverages;155.42;0
2020-06-27;theatre tickets;education and culture;125;0
2020-07-02;Salary Frank;Income;0;4896.44
2020-07-03;supermarket;food and beverages;147.90;0
2020-07-05;Salary Laura;Income;0;4910.14
2020-07-08;Golf Club, yearly payment;health and sports;612.18;0
2020-07-09;house insurance;insurances and taxes;167.89;0
2020-07-10;Fitness studio, Jane;health and sports;19.00;0
2020-07-10;supermarket;food and beverages;144.12;0
2020-07-11;payment to bank;monthly redemption payment;1287.43;0
2020-07-18;supermarket;food and beverages;211.24;0
2020-07-13;Pizzeria da Pulcinella;restaurants and hotels;33.00;0
2020-07-23;Cinema;education and culture;19;0
2020-07-25;supermarket;food and beverages;186.11;0
The above mentioned csv file is saved under the name expenses_and_income.csv
in the folder data
. It is easy to read it in with Pandas as we can see in our chapter Pandas Data Files:
import pandas as pd
exp_inc = pd.read_csv("/data1/expenses_and_income.csv", sep=";")
exp_inc
Date | Description | Category | Out | In | |
---|---|---|---|---|---|
0 | 2020-06-02 | Salary Frank | Income | 0.00 | 4896.44 |
1 | 2020-06-03 | supermarket | food and beverages | 132.40 | 0.00 |
2 | 2020-06-04 | Salary Laura | Income | 0.00 | 4910.14 |
3 | 2929-06-04 | GreenEnergy Corp., (electricity) | utility | 87.34 | 0.00 |
4 | 2020-06-09 | water and sewage | utility | 60.56 | 0.00 |
5 | 2020-06-10 | Fitness studio, Jane | health and sports | 19.00 | 0.00 |
6 | 2020-06-11 | payment to bank | monthly redemption payment | 1287.43 | 0.00 |
7 | 2020-06-12 | LeGourmet Restaurant | restaurants and hotels | 145.00 | 0.00 |
8 | 2020-06-13 | supermarket | food and beverages | 197.42 | 0.00 |
9 | 2020-06-13 | Pizzeria da Pulcinella | restaurants and hotels | 60.00 | 0.00 |
10 | 2020-06-26 | supermarket | food and beverages | 155.42 | 0.00 |
11 | 2020-06-27 | theatre tickets | education and culture | 125.00 | 0.00 |
12 | 2020-07-02 | Salary Frank | Income | 0.00 | 4896.44 |
13 | 2020-07-03 | supermarket | food and beverages | 147.90 | 0.00 |
14 | 2020-07-05 | Salary Laura | Income | 0.00 | 4910.14 |
15 | 2020-07-08 | Golf Club, yearly payment | health and sports | 612.18 | 0.00 |
16 | 2020-07-09 | house insurance | insurances and taxes | 167.89 | 0.00 |
17 | 2020-07-10 | Fitness studio, Jane | health and sports | 19.00 | 0.00 |
18 | 2020-07-10 | supermarket | food and beverages | 144.12 | 0.00 |
19 | 2020-07-11 | payment to bank | monthly redemption payment | 1287.43 | 0.00 |
20 | 2020-07-18 | supermarket | food and beverages | 211.24 | 0.00 |
21 | 2020-07-13 | Pizzeria da Pulcinella | restaurants and hotels | 33.00 | 0.00 |
22 | 2020-07-23 | Cinema | education and culture | 19.00 | 0.00 |
23 | 2020-07-25 | supermarket | food and beverages | 186.11 | 0.00 |
By reading the CSV file, we created a DataFrame object. What can we do with it, or in other words: what information interests Frank and Laura? Of course they are interested in the account balance. They want to know what the total income was and they want to see the total of all expenses.
The balances of their expenses and incomes can be easily calculated by applying the sum on the DataFrame exp_inc[['Out', 'In']]
:
exp_inc[['Out', 'In']].sum()
OUTPUT:
Out 5097.44 In 19613.16 dtype: float64
What other information do they want to gain from the data? They might be interested in seeing the expenses summed up according to the different categories. This can be done using groupby and sum:
category_sums = exp_inc.groupby("Category").sum()
category_sums
Out | In | |
---|---|---|
Category | ||
Income | 0.00 | 19613.16 |
education and culture | 144.00 | 0.00 |
food and beverages | 1174.61 | 0.00 |
health and sports | 650.18 | 0.00 |
insurances and taxes | 167.89 | 0.00 |
monthly redemption payment | 2574.86 | 0.00 |
restaurants and hotels | 238.00 | 0.00 |
utility | 147.90 | 0.00 |
category_sums.index
OUTPUT:
Index(['Income', 'education and culture', 'food and beverages', 'health and sports', 'insurances and taxes', 'monthly redemption payment', 'restaurants and hotels', 'utility'], dtype='object', name='Category')
import matplotlib.pyplot as plt
ax = category_sums.plot.bar(y="Out")
plt.xticks(rotation=45)
OUTPUT:
We can also display this as a pie chart:
ax = category_sums.plot.pie(y="Out")
ax.legend(loc="upper left", bbox_to_anchor=(1.5, 1))
OUTPUT:
Alternatively, we can create the same pie plot with the following code:
ax = category_sums["Out"].plot.pie()
ax.legend(loc="upper left", bbox_to_anchor=(1.5, 1))
OUTPUT:
If you imagine that you will have to type in all the time category names like "household goods and service" or "rent and mortgage interest", you will agree that it is very likely to have typos in your journal of expenses and income.
So it will be a good idea to use numbers (account numbers) for your categories. The following categories are available in our example.
The following categories are provided:
Category | Account Number |
---|---|
rent and mortgage interest | 200 |
insurances and taxes | 201 |
food and beverages | 202 |
education and culture | 203 |
transport | 204 |
health and sports | 205 |
household goods and services | 206 |
clothing | 207 |
communications | 208 |
restaurants and hotels | 209 |
utility ( heating, electricity, water, and garbage) | 210 |
other expenses | 211 |
income | 400 |
We can implement this as a dictionary mapping categories into account numbers:
category2account = {'monthly redemption payment': '200',
'insurances and taxes': '201',
'food and beverages': '202',
'education and culture': '203',
'transport': '204',
'health and sports': '205',
'household goods and services': '206',
'clothing': '207',
'communications': '208',
'restaurants and hotels': '209',
'utility': '210',
'other expenses': '211',
'Income': '400'}
The next step is to replace our "clumsy" category names with the account numbers. The replace
method of DataFrame
is ideal for this purpose. We can replace all the occurrences of the category names in our DataFrame by the corresponding account names:
exp_inc.replace(category2account, inplace=True)
exp_inc.rename(columns={"Category": "Accounts"}, inplace=True)
exp_inc[:5]
Date | Description | Accounts | Out | In | |
---|---|---|---|---|---|
0 | 2020-06-02 | Salary Frank | 400 | 0.00 | 4896.44 |
1 | 2020-06-03 | supermarket | 202 | 132.40 | 0.00 |
2 | 2020-06-04 | Salary Laura | 400 | 0.00 | 4910.14 |
3 | 2929-06-04 | GreenEnergy Corp., (electricity) | 210 | 87.34 | 0.00 |
4 | 2020-06-09 | water and sewage | 210 | 60.56 | 0.00 |
We will save this DataFrame
object now in an excel file. This excel file will have two sheets: One with the "expenses and income" journal and the other one with the mapping of account numbers to category names. We will turn the category2account dictionary into a Series object for this purpose. The account numbers serve as the index:
account_numbers = pd.Series(list(category2account.keys()), index=category2account.values())
account_numbers.name = "Description"
account_numbers.rename("Accounts")
OUTPUT:
200 monthly redemption payment 201 insurances and taxes 202 food and beverages 203 education and culture 204 transport 205 health and sports 206 household goods and services 207 clothing 208 communications 209 restaurants and hotels 210 utility 211 other expenses 400 Income Name: Accounts, dtype: object
exp_inc.insert(1, "accounts", account_numbers)
with pd.ExcelWriter('/data1/expenses_and_income_2020.xlsx') as writer:
account_numbers.to_excel(writer, "account numbers")
exp_inc.to_excel(writer, "journal")
writer.save()
Live Python training
Enjoying this page? We offer live Python training courses covering the content of this site.
Upcoming online Courses