2023년 1월 1일
08:00 AM
Buffering ...

최근 글 👑

pandas - 5

2025. 4. 6. 18:02ㆍ개발공부/생성형 AI 기반 개발자 과정
728x90

Group by

데이터에서 정보를 취하기 위해서 그룹별로 묶는 방법에 대해 알아보겠습니다.

student_list = [{'name': 'John', 'major': "Computer Science", 'sex': "male"},
                {'name': 'Nate', 'major': "Computer Science", 'sex': "male"},
                {'name': 'Abraham', 'major': "Physics", 'sex': "male"},
                {'name': 'Brian', 'major': "Psychology", 'sex': "male"},
                {'name': 'Janny', 'major': "Economics", 'sex': "female"},
                {'name': 'Yuna', 'major': "Economics", 'sex': "female"},
                {'name': 'Jeniffer', 'major': "Computer Science", 'sex': "female"},
                {'name': 'Edward', 'major': "Computer Science", 'sex': "male"},
                {'name': 'Zara', 'major': "Psychology", 'sex': "female"},
                {'name': 'Wendy', 'major': "Economics", 'sex': "female"},
                {'name': 'Sera', 'major': "Psychology", 'sex': "female"}
         ]
df = pd.DataFrame(student_list, columns = ['name', 'major', 'sex'])
df

.dataframe tbody tr th:only-of-type { vertical-align: middle; }

.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

name major sex
0 John Computer Science male
1 Nate Computer Science male
2 Abraham Physics male
3 Brian Psychology male
4 Janny Economics female
5 Yuna Economics female
6 Jeniffer Computer Science female
7 Edward Computer Science male
8 Zara Psychology female
9 Wendy Economics female
10 Sera Psychology female
groupby_major = df.groupby('major')
groupby_major.groups
{'Computer Science': Int64Index([0, 1, 6, 7], dtype='int64'),
 'Economics': Int64Index([4, 5, 9], dtype='int64'),
 'Physics': Int64Index([2], dtype='int64'),
 'Psychology': Int64Index([3, 8, 10], dtype='int64')}

here we can see, computer science has mostly man, while economic has mostly woman students

for name, group in groupby_major:
    print(name + ": " + str(len(group)))
    print(group)
    print()
Computer Science: 4
       name             major     sex
0      John  Computer Science    male
1      Nate  Computer Science    male
6  Jeniffer  Computer Science  female
7    Edward  Computer Science    male

Economics: 3
    name      major     sex
4  Janny  Economics  female
5   Yuna  Economics  female
9  Wendy  Economics  female

Physics: 1
      name    major   sex
2  Abraham  Physics  male

Psychology: 3
     name       major     sex
3   Brian  Psychology    male
8    Zara  Psychology  female
10   Sera  Psychology  female

그룹 객체를 다시 데이터프레임으로 생성하는 예제입니다.

df_major_cnt = pd.DataFrame({'count' : groupby_major.size()}).reset_index()
df_major_cnt

.dataframe tbody tr th:only-of-type { vertical-align: middle; }

.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

major count
0 Computer Science 4
1 Economics 3
2 Physics 1
3 Psychology 3
groupby_sex = df.groupby('sex')

아래의 출력을 통해, 이 학교의 남녀 성비가 균등하다는 정보를 알 수 있습니다.

for name, group in groupby_sex:
    print(name + ": " + str(len(group)))
    print(group)
    print()
female: 6
        name             major     sex
4      Janny         Economics  female
5       Yuna         Economics  female
6   Jeniffer  Computer Science  female
8       Zara        Psychology  female
9      Wendy         Economics  female
10      Sera        Psychology  female

male: 5
      name             major   sex
0     John  Computer Science  male
1     Nate  Computer Science  male
2  Abraham           Physics  male
3    Brian        Psychology  male
7   Edward  Computer Science  male
df_sex_cnt = pd.DataFrame({'count' : groupby_sex.size()}).reset_index()
df_sex_cnt

.dataframe tbody tr th:only-of-type { vertical-align: middle; }

.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

sex count
0 female 6
1 male 5
728x90

'개발공부 > 생성형 AI 기반 개발자 과정' 카테고리의 다른 글

pandas - 7  (0) 2025.04.06
pandas - 6  (0) 2025.04.06
pandas - 4  (0) 2025.04.06
pandas - 3  (0) 2025.04.06
pandas - 2  (0) 2025.04.06