728x90
Group by
데이터에서 정보를 취하기 위해서 그룹별로 묶는 방법에 대해 알아보겠습니다.
student_list = [{'name': 'John', 'major': "Computer Science", 'sex': "male"},
{'name': 'Nate', 'major': "Computer Science", 'sex': "male"},
{'name': 'Abraham', 'major': "Physics", 'sex': "male"},
{'name': 'Brian', 'major': "Psychology", 'sex': "male"},
{'name': 'Janny', 'major': "Economics", 'sex': "female"},
{'name': 'Yuna', 'major': "Economics", 'sex': "female"},
{'name': 'Jeniffer', 'major': "Computer Science", 'sex': "female"},
{'name': 'Edward', 'major': "Computer Science", 'sex': "male"},
{'name': 'Zara', 'major': "Psychology", 'sex': "female"},
{'name': 'Wendy', 'major': "Economics", 'sex': "female"},
{'name': 'Sera', 'major': "Psychology", 'sex': "female"}
]
df = pd.DataFrame(student_list, columns = ['name', 'major', 'sex'])
df.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
| name | major | sex | |
|---|---|---|---|
| 0 | John | Computer Science | male |
| 1 | Nate | Computer Science | male |
| 2 | Abraham | Physics | male |
| 3 | Brian | Psychology | male |
| 4 | Janny | Economics | female |
| 5 | Yuna | Economics | female |
| 6 | Jeniffer | Computer Science | female |
| 7 | Edward | Computer Science | male |
| 8 | Zara | Psychology | female |
| 9 | Wendy | Economics | female |
| 10 | Sera | Psychology | female |
groupby_major = df.groupby('major')groupby_major.groups{'Computer Science': Int64Index([0, 1, 6, 7], dtype='int64'),
'Economics': Int64Index([4, 5, 9], dtype='int64'),
'Physics': Int64Index([2], dtype='int64'),
'Psychology': Int64Index([3, 8, 10], dtype='int64')}here we can see, computer science has mostly man, while economic has mostly woman students
for name, group in groupby_major:
print(name + ": " + str(len(group)))
print(group)
print()Computer Science: 4
name major sex
0 John Computer Science male
1 Nate Computer Science male
6 Jeniffer Computer Science female
7 Edward Computer Science male
Economics: 3
name major sex
4 Janny Economics female
5 Yuna Economics female
9 Wendy Economics female
Physics: 1
name major sex
2 Abraham Physics male
Psychology: 3
name major sex
3 Brian Psychology male
8 Zara Psychology female
10 Sera Psychology female그룹 객체를 다시 데이터프레임으로 생성하는 예제입니다.
df_major_cnt = pd.DataFrame({'count' : groupby_major.size()}).reset_index()
df_major_cnt.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
| major | count | |
|---|---|---|
| 0 | Computer Science | 4 |
| 1 | Economics | 3 |
| 2 | Physics | 1 |
| 3 | Psychology | 3 |
groupby_sex = df.groupby('sex')아래의 출력을 통해, 이 학교의 남녀 성비가 균등하다는 정보를 알 수 있습니다.
for name, group in groupby_sex:
print(name + ": " + str(len(group)))
print(group)
print()female: 6
name major sex
4 Janny Economics female
5 Yuna Economics female
6 Jeniffer Computer Science female
8 Zara Psychology female
9 Wendy Economics female
10 Sera Psychology female
male: 5
name major sex
0 John Computer Science male
1 Nate Computer Science male
2 Abraham Physics male
3 Brian Psychology male
7 Edward Computer Science maledf_sex_cnt = pd.DataFrame({'count' : groupby_sex.size()}).reset_index()
df_sex_cnt.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
| sex | count | |
|---|---|---|
| 0 | female | 6 |
| 1 | male | 5 |
728x90
'개발공부 > 생성형 AI 기반 개발자 과정' 카테고리의 다른 글
| pandas - 7 (0) | 2025.04.06 |
|---|---|
| pandas - 6 (0) | 2025.04.06 |
| pandas - 4 (0) | 2025.04.06 |
| pandas - 3 (0) | 2025.04.06 |
| pandas - 2 (0) | 2025.04.06 |