DataMiningPratice(3)-RefiningUnnecessaryColumns
Adding new columns and refining unnecessary columns.
Change size classification to exclusive area column
The size
category column contains information on exclusive area. Since the exclusive area
is more intuitive than the size category, create a new column called exclusive area and subtract phrases such as excess
, or less
from the existing size classification value to make it concise.
At this time, using the replace function of str, for example, if “exclusive area exceeds 60㎡ and is less than 85㎡”, change it to “60㎡~85㎡”.
- More information of Pandas’s string handling function : https://pandas.pydata.org/pandas-docs/stable/reference/series.html#string-handling
View the unique value of the scale division.
df_last["규모구분"].unique() # Scare Division
array(['전체', '전용면적 60㎡이하', '전용면적 60㎡초과 85㎡이하', '전용면적 85㎡초과 102㎡이하',
'전용면적 102㎡초과'], dtype=object)
Change the scale division(전용면적)
to exclusive area
df_last["규모구분"].replace("전용면적", "")
0 전체
1 전용면적 60㎡이하
2 전용면적 60㎡초과 85㎡이하
3 전용면적 85㎡초과 102㎡이하
4 전용면적 102㎡초과
...
4330 전체
4331 전용면적 60㎡이하
4332 전용면적 60㎡초과 85㎡이하
4333 전용면적 85㎡초과 102㎡이하
4334 전용면적 102㎡초과
Name: 규모구분, Length: 4335, dtype: object
df_last["Exclusive Area"] = df_last["규모구분"].str.replace("Exclusive Area", "")
# Once you entered `str` it's working well
df_last["Exclusive Area"] = df_last["Exclusive Area"].str.replace("전용면적", "")
df_last["Exclusive Area"] = df_last["Exclusive Area"].str.replace("초과", "~")
df_last["Exclusive Area"] = df_last["Exclusive Area"].str.replace("이하", "")
df_last["Exclusive Area"] = df_last["Exclusive Area"].str.replace(" ", "").str.strip()
df_last["Exclusive Area"]
0 전체
1 60㎡
2 60㎡~85㎡
3 85㎡~102㎡
4 102㎡~
...
4330 전체
4331 60㎡
4332 60㎡~85㎡
4333 85㎡~102㎡
4334 102㎡~
Name: Exclusive Area, Length: 4335, dtype: object
Remove unnecessary columns
Remove the column pretreated by drop. Methods related to data frames in pandas sometimes require an axis option, which means which row or column to process. It usually defaults to 0, meaning processing is row-by-row. Check if the memory usage is reduced.
See the information of the list with .info()
and .head(1)
.
df_last.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4335 entries, 0 to 4334
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 지역명 4335 non-null object
1 연도 4335 non-null int64
2 월 4335 non-null int64
3 분양가격 3957 non-null float64
4 평당분양가격 3957 non-null float64
5 Exclusive Area 4335 non-null object
6 규모구분 4335 non-null object
dtypes: float64(2), int64(2), object(3)
memory usage: 237.2+ KB
df_last.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4335 entries, 0 to 4334
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 지역명 4335 non-null object
1 연도 4335 non-null int64
2 월 4335 non-null int64
3 분양가격 3957 non-null float64
4 평당분양가격 3957 non-null float64
5 Exclusive Area 4335 non-null object
6 규모구분 4335 non-null object
dtypes: float64(2), int64(2), object(3)
memory usage: 237.2+ KB
df_last.head(1)
지역명 | 연도 | 월 | 분양가격 | 평당분양가격 | Exclusive Area | 규모구분 | |
---|---|---|---|---|---|---|---|
0 | 서울 | 2015 | 10 | 5841.0 | 19275.3 | 전체 | 서울 |
Use drop
to delete unnecessary columns.
- Pay attention to the axis when using drop.
- axis 0: row, 1: column
df_last = df_last.drop(["규모구분", "분양가격"], axis=1)
# Check if the columns have been successfully deleted.
df_last.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4335 entries, 0 to 4334
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 지역명 4335 non-null object
1 연도 4335 non-null int64
2 월 4335 non-null int64
3 평당분양가격 3957 non-null float64
4 Exclusive Area 4335 non-null object
dtypes: float64(1), int64(2), object(2)
memory usage: 169.5+ KB
# Check if memory usage is reduced by removing columns.
df_last.head()
지역명 | 연도 | 월 | 평당분양가격 | Exclusive Area | |
---|---|---|---|---|---|
0 | 서울 | 2015 | 10 | 19275.3 | 전체 |
1 | 서울 | 2015 | 10 | 18651.6 | 60㎡ |
2 | 서울 | 2015 | 10 | 19410.6 | 60㎡~85㎡ |
3 | 서울 | 2015 | 10 | 18879.3 | 85㎡~102㎡ |
4 | 서울 | 2015 | 10 | 19400.7 | 102㎡~ |
Leave a comment