scripts/generate-attendance-reports.py at main · Glasgow-ICG/scripts · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
'''
    ####################################################################################
    ######## Script to generate attendance report from Zoom 'participants' logs ########
    ####################################################################################
    Do please drop me an email to let me know if you find this script useful (or have any additions to contribute):
    jonathan.taylor at glasgow.ac.uk
        -- Jonathan Taylor, Glasgow University, October 2020

    Version 2.0 updates: Semi-automated feature to merge personal and university email addresses. Generate report tables by day and by week.
    Version 3.0 update:  Command-line parameters
    Version 4.0 update:  Tweaks to support wider range of .csv formats generated by different versions of zoom
    Version 4.1 update:  Sort meeting-report-by-week correctly despite new year, i.e. order it oct-dec-jan-may rather than jan-dec

    Command line usage:
                One or more parameters giving paths to directory containing .csv files.
                    Each directory will be processed independently. The directory will be scanned for all files of the form
                    "participants*.csv" (downloaded from zoom - see below), and attendance reports will be generated in that same directory.
                    If no directories specified, script runs on current directory
                    (If you include *unquoted* wildcards in your command line paths, all matching directories will be processed independently)
                Optional parameters:
                -m2 (or another numeral) to specify that students must attend a minimum of 2 sessions.
                    Warnings will then be generated for any students not meeting that threshold
                    (but watch out for students who may have signed in under a personal and a university email address,
                     if these have not been successfully paired together)
                -am to specify that the Zoom reports have been generated using american date formats.

    Outputs:
    - a file 'meeting-report.csv' listing all the participants, and giving a chronological account of which meetings
      they have attended (and for how long). This report is just a list of all dates that the student has attended
      (and the total time they attended for).
    - a file 'meeting-report-by-day.csv' giving a table of participants vs dates attended.
    - a file 'meeting-report-by-week.csv' giving a table of participants vs weeks attended
      (helpful for labs where different students attend once each, on different days of the week)
    - warnings about low-attending students

    Limitations:
    - At the moment the output data is sorted by email address (which is not a particularly helpful ordering
      given that the students have numerical email addresses). However, email address seemed like the most
      reliable thing to use as a key, given that some students change their display name on Zoom.
    - Script currently assumes only one meeting per day, and will fuse into one entry if there was
      more than one separate 'participants' file referring to the same calendar date.
    - Signing in to zoom is no proof that the student is actually present or engaged at the computer at the other end!

    Customization (Glasgow University):
    - To merge records between students' personal and university email addresses,
      manually edit the code that initializes the dictionary 'emailMapping'.
      If you just run the script without this, the code will try and make suggestions about possible pairings it has guessed for you.
      You would then need to manually edit the emailMapping dictionary to add those pairings.
    - If it is hard to distinguish demonstrator emails from undergraduate student emails,
      you may need to edit knownDemonstratorEmails to manually identify demonstrator emails
    Additional customization (other universities):
    - Edit the function "StaffEmail()" to identify staff/demonstrator email addresses that should be excluded from the reports
    - Edit the function "UniversityStudentEmail()" to identify email addresses that are official university stuent email addresses

    To download the participants files:
    The meeting owner should go to "edit meeting" then Reports -> Usage and search for the relevant date range.
    This brings up a table of data about meetings. Click on the number (hyperlink) in the "Participants" column of the table,
    and that pops up a list of participants. Click "Export", then hit close/escape to go back to the main table.
    Repeat for the other meetings of interest, and then gather all the downloaded files into a suitable directory for processing.
'''

import numpy as np
import csv, datetime, glob, sys

def StaffEmail(email):
    # Returns True if this looks like a staff email, so this can be excluded from the attendance report.
    # If staff emails are indistinguishable in format from student emails, or you don't want to bother with this,
    # just return False from this function. In that case, the worst that will happen is you'll get attendance reports
    # for staff/demonstrators as well.
    # If you can't distinguish some or all demonstrator emails, but want to enter them manually, then
    # knownDemonstratorEmails can contain a manually-curated list of demonstrator emails
    # that would be otherwise indistinguishable from undergraduate email addresses.
    knownDemonstratorEmails = []
    return ("@glasgow.ac.uk" in email) or           \
           ("@research.glasgow.ac.uk" in email) or  \
           ("@gla.ac.uk" in email) or               \
           ("@research.gla.ac.uk" in email) or      \
           (email in knownDemonstratorEmails)

def UniversityStudentEmail(email):
    # Returns True if this looks like an official university student email, as opposed to a personal email address.
    # If you can't specifically distinguish student email addresses from staff, just return True for all university email addresses.
    return "@student.gla.ac.uk" in email

def date_from_isoweek(iso_year, iso_weeknumber, iso_weekday):
    # From stackoverflow ( https://stackoverflow.com/questions/304256/whats-the-best-way-to-find-the-inverse-of-datetime-isocalendar )
    return datetime.datetime.strptime(
        '{:04d} {:02d} {:d}'.format(iso_year, iso_weeknumber, iso_weekday),
        '%G %V %u').date()

# Manually curated list of email pairs for students who I have noticed switched from personal to GU emails.
# e.g. add entries like:
#     "easyrider2001@hotmail.com": "1234567a@student.gla.ac.uk",
# The code later, commented "Useful utility routine", can help by providing automatic suggestions of matches to add here.
# However, those suggestions rely on the student giving themselves a correct and clear display name
# alongside their personal email address, so it won't always succeed in spotting pairings.
emailMapping = { }
reverseEmailMapping = {v: k for k, v in emailMapping.items()}
dateFormat = '%d/%m/%Y %H:%M:%S %p'

directoriesToProcess = []
warningThreshold = 0
processDefaultDirectory = True
for arg in sys.argv[1:]:
    if arg.startswith("-m"):
        if (len(arg) == 2):
            print("Usage: rerun with e.g. \"-m4\" to warn for students who have attended <=4 sessions")
        else:
            print("Will warn for students who have attended <={0} sessions".format(arg[2:]))
            warningThreshold = int(arg[2:])
    elif arg == "-am":
        print("Will expect american date formats")
        dateFormat = '%m/%d/%Y %H:%M:%S %p'
    else:
        processDefaultDirectory = False
        if "*" in arg:
            print("Warning: ignoring quoted wildcard was passed in as a command line parameter \"{0}\".".format(arg))
            print("  That approach is not supported - use an unquoted wildcard if you want to process a batch of directories independently")
        else:
            directoriesToProcess.append(arg)

if processDefaultDirectory:
    print("No directories specified in command line arguments specified - processing current directory")
    directoriesToProcess = ["."]


for sourcePath in directoriesToProcess:
    print("\n===== Processing directory \"{0}\" =====".format(sourcePath))

    ##########################################
    ### Load all available meeting records ###
    ##########################################

    # Dictionary to accumulate all our data
    emails = dict()

    # List of input files to process
    filenames = glob.glob("{0}/participants*.csv".format(sourcePath))

    for fn in filenames:
        # First a quick peek to check the date in the file
        with open(fn, newline='') as csvfile:
            csvreader = csv.reader(csvfile, delimiter=',')
            row1 = next(csvreader)
            row2 = next(csvreader)
            print('Processing file {0} (date {1})'.format(fn, row2[2]))
        # Now process the file properly
        with open(fn, newline='') as csvfile:
            csvreader = csv.reader(csvfile, delimiter=',')
            for row in csvreader:
                if row[0].endswith('Name (Original Name)'):
                    # Skip header row.
                    # Note that the use of 'endswidth' avoids problems caused by a weird
                    # unicode character that Zoom puts at the very start of the .csv files it generates.
                    continue

                # Parse data row
                name = row[0]
                email = row[1].lower()  # Convert to lowercase because some students seemed to change that mid-semester
                if email in emailMapping:
                    emailKey = emailMapping[email]
                else:
                    emailKey = email
                # This gets a bit fiddly because, depending on settings, Zoom may generate reports with 3, 5 or 6 columns
                # This is an empirical effort to parse all versions I am aware of, though ideally maybe we should be
                # parsing the column header to identify what each column represents
                if (len(row) >= 5):
                    mins = int(row[4])
                    start = row[2]
                    try:
                        date_time_obj = datetime.datetime.strptime(start, dateFormat)
                    except:
                        print("*** ERROR ***: parsing of date failed. Are your dates in american format (mm/dd/yyyy)?")
                        print("If so, rerun this script with the '-am' option")
                        print("")
                        raise
                    date = date_time_obj.date()
                else:
                    start = fn
                    date = fn   # We don't have access to a date - just use the filename as a proxy for that
                    mins = int(row[2])

                # Create an entry if we have not encountered this student before
                if not emailKey in emails:
                    emails[emailKey] = dict()

                if date in emails[emailKey]:
                    # We already have an entry for this student on this date.
                    # Add the number of minutes from the current data line we have just read
                    emails[emailKey][date][3] += mins
                else:
                    # Create a new entry for this student on this date
                    emails[emailKey][date] = [name, email, start, mins]


    ################################################################################################
    ### Useful utility routine to spot who we have failed to match up to a Glasgow email address ###
    ################################################################################################
    for email in sorted(emails):
        entry = emails[email]

        if (not StaffEmail(email)) and (not UniversityStudentEmail(email)):
            firstEntry = next(iter(entry.values()))
            print("NOTE: student {0}, {1} not matched to university email address".format(firstEntry[0], firstEntry[1]))
            # Try and be helpful by seeing if we can find a match for the surname in an entry that *does* have a GU email address
            possibleSurname = firstEntry[0].split(' ')[-1]
            for email2 in sorted(emails):
                if UniversityStudentEmail(email2):
                    for date in emails[email2]:
                        thisEntry = emails[email2][date]
                        thisName = thisEntry[0]
                        if possibleSurname in thisName:
                            print(" Might match to {0}, {1}?".format(thisEntry[0], thisEntry[1]))
                            print(" If so, manually add table row \"{0}\": \"{1}\",".format(firstEntry[1], thisEntry[1]))
                            break

    #####################################################
    ### Generate the attendance list for all students ###
    #####################################################
    # Also generates warnings about low-attending students
    # who have attended <= the specified minimum number of sessions.
    outputStudentAttendanceOnly = True

    with open("{0}/meeting-report.csv".format(sourcePath), mode='w') as csvOutput:
        csvwriter = csv.writer(csvOutput, delimiter=',')
        for email in sorted(emails):
            if outputStudentAttendanceOnly and StaffEmail(email):
                continue

            entry = emails[email]

            # Write out data to meeting report
            for date in sorted(entry):
                row = entry[date]
                #print('{0}, {1}, {2}, {3}'.format(row[0], row[1], row[2], row[3]))
                csvwriter.writerow(row)
            # Monitor low-attending students
            if (not StaffEmail(email)) and (len(entry) <= warningThreshold):
                print("WARNING: student {0} {1} only attended {2} sessions".format(row[0], row[1], len(entry)))

    ########################################################
    ### Generate table of student name vs dates attended ###
    ########################################################

    # First identify all the meeting dates
    dateCatalogue = dict()
    for email in sorted(emails):
        entry = emails[email]
        for date in entry:
            if not date in dateCatalogue:
                dateCatalogue[date] = date

    # Now build up our table
    with open("{0}/meeting-report-by-date.csv".format(sourcePath), mode='w') as csvOutput:
        csvwriter = csv.writer(csvOutput, delimiter=',')
        outputRow = ["Name", "Email"]
        for date in sorted(dateCatalogue):
            outputRow.append(date)
        csvwriter.writerow(outputRow)

        for email in sorted(emails):
            if outputStudentAttendanceOnly and StaffEmail(email):
                continue

            studentRecord = emails[email]
            firstEntry = next(iter(studentRecord.values()))
            name = firstEntry[0]

            # Write out row to meeting report
            outputRow = [name, email]
            for date in sorted(dateCatalogue):
                if date in studentRecord:
                    row = studentRecord[date]
                    outputRow.append(row[3])
                else:
                    outputRow.append("")
            csvwriter.writerow(outputRow)

    ########################################################
    ### Generate table of student name vs weeks attended ###
    ########################################################

    # First identify all the meeting weeks
    weekCatalogue = dict()
    for email in sorted(emails):
        entry = emails[email]
        for date in entry:
            week = date.isocalendar()[1]
            if not week in weekCatalogue:
                weekStart = date_from_isoweek(date.isocalendar()[0],
                                              date.isocalendar()[1],
                                              1)
                weekCatalogue[week] = weekStart

    # Now build up our table
    # To deal with the january wraparound to week #1, we need to use a construction like this
    # to ensure we actually sort our week-date dictionary by calendar date, not week-of-the-year
    #     for (weekNum,date) in sorted(weekCatalogue.items(), key=lambda kv: kv[1]):
    with open("{0}/meeting-report-by-week.csv".format(sourcePath), mode='w') as csvOutput:
        csvwriter = csv.writer(csvOutput, delimiter=',')
        outputRow = ["Name", "Email"]
        for (weekNum,date) in sorted(weekCatalogue.items(), key=lambda kv: kv[1]):
            outputRow.append(weekCatalogue[weekNum])
        csvwriter.writerow(outputRow)

        for email in sorted(emails):
            if outputStudentAttendanceOnly and StaffEmail(email):
                continue

            studentRecord = emails[email]
            firstEntry = next(iter(studentRecord.values()))
            name = firstEntry[0]

            # Write out row to meeting report
            outputRow = [name, email]
            for (weekNum,date) in sorted(weekCatalogue.items(), key=lambda kv: kv[1]):
                print(weekNum)
                attendanceSum = 0
                for date in studentRecord:
                    studentEntryWeek = date.isocalendar()[1]
                    if (studentEntryWeek == weekNum):
                        attendanceSum += studentRecord[date][3]
                if (attendanceSum > 0):
                    outputRow.append(attendanceSum)
                else:
                    outputRow.append("")
            csvwriter.writerow(outputRow)